Scripting

Remove filename extensions with os.path.splitext instead of the {lr}strip or replace methods in Python

Jesper Noehr explains why {l,r}strip are considered harmful for removing extensions from filenames with Python. I think he’s absolutely right on that score, and I would agree. The lstrip() and rstrip() methods shouldn’t be used for this purpose.

However, like the only commenter on that post, I’d also recommend os.path.splitext() as the proper tool for the extension-removing job.

Let’s take some example filenames you might come across on Mac OS X Snow Leopard:

>>> a = [ 'Document.pages', 'Property List.plist', 'Application.app', 'Word.docx', 'VPN (Cisco VPN).networkConnect', 'Dated Spreadsheet 2010.02.17.xlsx']

If we had a list of filenames (or file paths) like this — perhaps created by os.walk() or some other generator-based process — we couldn’t easily use Jesper’s recommended solution. The replace() string method would give us a much harder time dealing with the range of filenames and extensions in that list. In the case where you don’t know the filename extensions in advance, replace() breaks down. The replace() method would have to be looped with many possible filename extensions.

What we need is a way to split filename from extension, even if we don’t know the extension beforehand. The os.path.splitext() alternative does just that, returning a tuple. Here, I’ll import the os module and then use a list comprehension to run os.path.splitext() through every filename in the list above.

>>> import os
>>> [os.path.splitext(x) for x in a]
[('Document', '.pages'), ('Property List', '.plist'), ('Application', '.app'), ('Word', '.docx'), ('VPN (Cisco VPN)', '.networkConnect'), ('Dated Spreadsheet 2010.02.17', '.xlsx')]

It becomes a simple matter to get just the filename from the tuple, as I do here by modifying the list comprehension to just get the zeroth item from it:

>>> [os.path.splitext(x)[0] for x in a]
['Document', 'Property List', 'Application', 'Word', 'VPN (Cisco VPN)', 'Dated Spreadsheet 2010.02.17']

Note that several interesting conditions are handled by os.path.splitext():

  • long filename extensions, including “.pages” and “.networkConnect”
  • spaces in filenames, as in “Property List.plist”
  • parentheses, as in “VPN (Cisco VPN).networkConnect”
  • periods within the filename, as seen in “Dated Spreadsheet 2010.02.17.xlsx”

AppleScript date and time format parsing change in Snow Leopard

I had a handy script I wrote on a plane years ago that let me block off my Entourage calendar at specific times each weekday for a given week. The times for these events were created by concatenating some strings and converting the result into an AppleScript date object. I mention that merely for background, and because it was an incredibly geeky way to automate the tedious process of blocking off lunch on my calendar (without resorting to recurring events).

I found that my script didn’t work in Snow Leopard — despite flawless operation across several successive major versions of Mac OS X. The dates themselves remained correct, but the times were all coming up as 12:00 a.m. instead of what was expected.

For example, here’s a simplified reproduction scenario you can try on Snow Leopard:

set vEventDateString to "Friday, August 28, 2009"
set vEventStartTime to "10:00 a.m."
log vEventStartTime
(*10:00 a.m.*)
set vEventStartDate to the (date (vEventDateString & " " & vEventStartTime))
log vEventStartDate
(*date Friday, August 28, 2009 12:00 AM*)

It turns out that the fix is easy if not exactly obvious: remove the periods from “a.m.” and “p.m.” before converting strings to date objects. (I use the periods because I follow the Associated Press Stylebook!)

set vEventDateString to "Friday, August 28, 2009"
set vEventStartTime to "10:00 am"
log vEventStartTime
(*10:00 am*)
set vEventStartDate to the (date (vEventDateString & " " & vEventStartTime))
log vEventStartDate
(*date Friday, August 28, 2009 10:00 AM*)

So, there is a workaround in the unlikely event you encounter the same problem with your scripts.

Core Graphics bindings on 64-bit Python in Snow Leopard

Mac OS X Snow Leopard does not include Core Graphics bindings (CGBindings) for 64-bit Python.

The SWIG-based Python CGBindings originally shipped with Mac OS X 10.3, which bundled Python 2.3. Since that time, these bindings — specific to the system’s bundled framework build of Python — had allowed access to Core Graphics objects and commands from within scripts.

They were one of the reasons I decided to use Python in the first place. I thought they would be fun to learn and use, particularly with the then-new PDF Services feature of Mac OS X. The Core Graphics bindings also provided much, much more power than the command line sips tool and had an advantage over other alternatives by being bundled with the operating system. I thought they offered the possibility of growing with Mac OS X’s graphics hardware acceleration. I even found a way to use them to create better screenshots with drop shadows, a task where I’d previously employed Ambrosia’s Snapz Pro X.

Here’s an example of what you’ll see on Snow Leopard if you try to “import CoreGraphics” in 64-bit Python:

# Explicitly enable 64-bit execution for Python
$ export VERSIONER_PYTHON_PREFER_32_BIT=no  
$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from CoreGraphics import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/BinaryCache/CoreGraphicsBindings/CoreGraphicsBindings-26~139/Root/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/CoreGraphics/__init__.py", line 7, in <module>
ImportError: /System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/CoreGraphics/_CoreGraphics.so: no appropriate 64-bit architecture (see "man python" for running in 32-bit mode)
>>> ^D

With 32-bit Python on Snow Leopard:

# Explicitly enable 32-bit execution for Python
$ export VERSIONER_PYTHON_PREFER_32_BIT=yes
$ python                                  
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from CoreGraphics import *
>>> ^D

While the CGBindings are still available to 32-bit Python in Snow Leopard, you must use PyObjC to replace their functionality for 64-bit Python. Since 64-bit Python is the default in Snow Leopard, it makes sense to transition from the bindings to PyObjC as soon as possible. This means there is some porting work for scripts that used the Core Graphics bindings. I guess I’m glad I didn’t do as much with them as I’d planned.

I see this change as something of a loss. (Is this what Carbon developers are experiencing? Hm.) The Core Graphics bindings were relatively easy to use and felt reasonably Pythonic, even if the documentation was almost nonexistent. PyObjC feels more foreign to me when I attempt to use it — even though it’s clearly the future.

Python 32-bit execution on Snow Leopard

The default installation of Python on Mac OS X Snow Leopard is version 2.6.1. According to the man page for Python on Snow Leopard, Python 2.6 executes as a 64-bit application by default.

If, for some reason, you need to run it as a 32-bit application, this can be changed at the command line:

# Prefer 32-bit execution for Python 2.6.1 on Snow Leopard
$ defaults write com.apple.versioner.python Prefer-32-Bit -bool yes

The preference can be set in either the User or Local filesystem domain in Mac OS X, following the normal precedence rules. To unset it, presumably you would change the boolean to “no” — or perhaps even delete the “Prefer-32-Bit” key.

There is also an environment variable that can override this preference.

Updating Acquia Drupal versions in a Mercurial repository

After using Acquia Drupal for a while, I took advantage of a trial subscription to the Acquia Network. The network’s services showed me that I had files present in my install that the agent could not account for.

I suspected this was happening because of the way I manage my Acquia Drupal installation with Mercurial. So, I’ve modified my previous process (and updated my instructions) to extract the downloaded tar archive with the --recursive-unlink option. This option appears to successfully remove the contents of every directory before putting new files back into them.

$ tar --strip-path=1 --directory=acquia_drupal --recursive-unlink -zxvf acquia-drupal-1.2.12.5047.tar.gz
acquia-drupal-1.2.8/
acquia-drupal-1.2.8/robots.txt
...
acquia-drupal-1.2.8/INSTALL.txt

When the archive is extracted in this way, my repository’s working directory shows modified, unknown, and deleted files. This allows me to treat each category of files individually before I commit the changes for a Drupal update as a revision.

$ hg status

The modified files will be tracked normally because they’ve already been added to the Mercurial repository, so I don’t need to do anything special for them.

The unknown files are ones that are completely new, and have not appeared in the same position in a previous revision. They have yet to be tracked by Mercurial, so I have to add them to the repository. To add just those unknown files, then, I have to pick them out from the status listing:

$ hg status --unknown

In order to operate just on those files to add them to the repository, I run a for loop:

$ for FILEPATH in `hg status --unknown --no-status`
for> do
for>    hg add "$FILEPATH"
for> done

This changes the “?” status to “A,” because the files were successfully being tracked by Mercurial.
I use the “--no-status” flag on the “status” command so that just the file paths are printed; the actual status code is not, which is appropriate for the target of the “add” command in the loop.

I do the same basic steps with deleted files. These are files that were in the previous revisions but have been deleted by the --recursive-unlink option from the tar extraction and not replaced with the extraction of the new Acquia Drupal tar archive. If the deleted files had been replaced by the tar extraction, they would either be unchanged (which would not show up in the “status” output) or marked as modified.

To remove the files that are marked as deleted from the repository’s working directory:

$ for FILEPATH in `hg status --deleted --no-status`
for> do
for>    hg remove "$FILEPATH"
for> done

However, that may be the same as simply using the following, which I have to explore further:

$ hg remove --after

So, to follow all of these changes in the repository, I run the loop for the uknown files and the loop for the deleted files. The modified files are already tracked, so I don’t need to do anything additional for them. After that, a “commit” will record all of the changes — modifications, additions, and deletions — in the repo.

These commands are based on my current understanding of Mercurial, and they do work for me right now. There could certainly be another better way to do this in one fell swoop — or at least fewer steps. I would welcome that, so if you’re aware of a way, feel free to comment or contact me.

Update: I found that the “hg addremove” command cleanly replaces all of the shell loops I mentioned above. Therefore, I recommend using it instead of the “for” loops I described.

Spot problem commands in Apple Installer package scripts

It should come as no surprise that Apple Installer installation packages can contain scripts. These scripts are supposed to conduct important operations during the course of the software installation.

However, when you are the system administrator of more than one Mac, you find that developers sometimes miss a good balance between what you think should be in the installer payload versus what should be in its scripts. The payload of a installer, by definition, are the files and links that should be installed, along with information on where they should be installed as well as how (i.e. ownership, permissions).

Therefore, developers should not need to run scripts that create or delete files — they should be created from the payload itself, and if a file must be deleted during the install then consider that perhaps you’re doing it wrong. Likewise, there should be little need move or copy files, because as many copies as desired can be installed from the paylod. Similarly, the need to change ownership or modify permissions should be taken care of in the payload.

Perhaps I’m being a purist here. I’m certainly accused of that, from time to time. However, this just makes sense to me and I happen to think that many developers are similarly logical people. They just aren’t the kind of logical people who happen to spend effort on software installation, especially the kind that results in a deployment-friendly installer package.

So how do we as administrators verify the quality of the scripts in installers? Is there a way we can quickly peer into them to decide if any of the scripts’ steps will be superfluous or even (gasp!) harmful?

Well, I have a quick suggestion for scanning packaged installers. The following one-liner shell command will search an installer package or metapackage for scripts that have the kinds of steps outlined above.

$ find /path/to/installer.pkg -regex '.*/*\(flight\)' -or -regex '.*/*\(install\)' -or -regex '.*/*\(upgrade\)' -exec grep --with-filename --line-number '\(cp\|mv\|ln\|>{1,2}\|cat\|echo\|chown\|chmod\|rm\|srm\)' {} \;

Note that this will only work for the traditional installer packages; it will not work with Leopard-style flat packages (which are documented so badly by Apple that the best description comes from reverse engineering by Iceberg's author). The one-liner will currently only find the defined install operations scripts: preflight, preinstall, preupgrade, postinstall, postupgrade, and postflight. (Any other scripts are likely to be called by one of those six.) It assumes those scripts will be shell scripts, currently, even though any of them could be written in other scripting languages installed with Mac OS X, like Python, Perl, or Ruby. It will also not work on the JavaScript-based system and volume requirements portions of the installation.

However, it’s a start. The output displays the offending file and line number, so you can conduct more careful examination of the matches it finds.

I haven’t run this on an exhaustive list of installation packages, but I have already seen at least one installer that produces worrisome output.

Update: I’ve changed the regex for the pre/postflight script so that it is more general that what I originally posted. I’m also having some problems with the snippet working with a certain installer whose scripts I know have cp and chmod commands. So, I may be back to the drawing board with this; comments are welcome.

Get the number of pages for a PDF using the Quartz 2D Python bindings

A question came up on the AppleScript-Users mailing list and I wanted to make note if it, because I came up with a quick Python-based way to answer it. Here, I reiterate that answer on how to get the number of pages from a PDF.

This Python 2.5 sequence (shown as run interactively from the Terminal on Leopard — not as a script) works for me using both a local file and a file from an AFP-mounted volume. It uses the Quartz 2D bindings for Python, which are part of the Python install on Mac OS X (since 10.3?), so it is not pure Python.

$ python
>>> import os
>>> from CoreGraphics import *
>>> pdf_filename = '/Volumes/path/to/PDF/document.pdf'
>>> provider = CGDataProviderCreateWithFilename(pdf_filename)
>>> pdf = CGPDFDocumentCreateWithProvider(provider)
>>> pages = pdf.getNumberOfPages()
>>> pages
9
>>> print(pages)
9

Substitute your own path for the pdf_filename variable’s contents, and you can try it yourself. (Above, $ is the shell prompt, and >>> is the Python interpreter’s command prompt, so don’t copy/paste those.)

Since I was doing this in Terminal, I could also type "pdf_filename = '" then drag and drop a file from the Finder into Terminal window to insert its path, complete the quoting, and pressed Return. (Saves some time and potential for mistyping.)

You could wrap this sequence into one longish command line and run it from AppleScript with “do shell script.” Or you could save it as a Python script file and again run it with “do shell script.” Or, you could skip AppleScript and just use Python, which I prefer over AppleScript. (Python seemed very natural to me with my minor AppleScript background, and I'm pretty sure I've written more of it than AppleScript by this point.)

This is derived from “Example 2: Splitting a PDF File” in the Using Python with Quartz 2D on Mac OS X document at the Apple Developer Connection.

That ADC example also shows how you could specify a PDF file’s path at the command line (with sys.argv), rather than hardcoding it as I did for my example.

There may be a shorter way to do this since I’m no expert on Quartz 2D. (I just appreciate that the bindings are there for a scripting language I happen to like a lot.) I honestly don’t know what’s happening when the provider and pdf objects are being set, but I really don’t need to know for this.

Getting the settings right for the Drupal GeSHi Filter module

I wanted to find a way to do syntax highlighting of code snippets on my Drupal blog. I came across the GeSHi Filter module, which lets Drupal sites take advantage of the apparently well-regarded GeSHi Generic Syntax Highlighter library that’s meant for just this purpose.

However, I ran into some roadblocks implementing it on my site. Here’s the short story of what I settled on after some trial and error.

My existing code snippets are in <code> blocks, and the initial GeSHi Filter settings applied badly to them. I made the decision to only use GeSHi on <blockcode> blocks, since I wasn’t using that tag yet and it wouldn’t conflict with the snippets already posted.

I most commonly write Bash/Zsh, Python, and AppleScript snippets on my blog. However, the Bash code I was using as part of my trial and error simply wasn’t highlighting; it was coming through as the default (and boring) plain text — but was at least boxed off from the rest of the blog post.

I thought that GeSHi wasn't correctly discovering that the code was written in UNIX shell syntax. I couldn’t find a way to specify the language for that blockcode tag, until I did some searching on the ’net. To change my blockquotes to choose a certain language — at least for the purposes of this Drupal module, if not for GeSHI in general — I needed to add the “lang=lang” style to the tag. For Bash, I could use “lang=bash,” for Python, “lang=python,” and for AppleScript, “lang=applescript.” That made sense.

However, my code was still not being syntax highlighted. I discovered that the Drupal module came with an initial set of languages enabled. The others were all turned off, but that could be changed in the module settings. Without turning them on, even properly-tagged <blockcode> sections did not get the benefit of syntax highlighting.

I changed the GeSHi Filter options to enable some of the languages that were initially disabled, and then disabled the ones I didn’t anticipate using. This allowed me to add Bash and AppleScript syntax highlighting support, as both had been turned off by default. After that, I saw the results I’d hoped for: a syntax-highlighted code snippet.

It took some work, but now that it’s done, I should be all set.

Check for SSL certificate expiration of Radmind client certificates

You can find out if an SSL certificate has expired with the command below. I’ve found it useful to be able to check for expired certificates in my use of Radmind, where you can uniquely identify clients to the server with them.

$ openssl x509 -in /path/to/cert.pem -noout -checkend 0

I mention this command primarily because I reviewed the the OpenSSL x509 man page (“man x509”) that comes with Mac OS X Leopard, and it didn’t show the “checkend” option for the command. That was odd, because that option was just what I needed.

I did, however, find it documented in the usage statement-style help for the command:

$ openssl x509 --help

In that usage statement, the “checkend” option is described (with little punctuation) as a way to “check whether the cert expires in the next arg seconds [sic] exit 1 if so, 0 if not.” So, using zero seconds shows you if the certificate has already expired, while an integer greater than zero will show if it will expire in the future. No matter how many seconds you check against, you must examine the results from the exit code (the “$?” shell variable) to see if the certificate is or has expired.

I find this is tremendously useful knowledge when dealing with certificates in Radmind, where an expired certificate can mean the failure of a client to connect to the Radmind server. It could be beneficial in other circumstances, of course — but I don't have those circumstances.

Taking this further, you could check for certificate expiration on a Radmind server — if your certificates are stored in the Radmind special directory for each hostname of a managed client. (Substitute one of your own managed clients’ hostnames for “hostname” in the path below.)

$ openssl x509 -in /var/radmind/special/hostname/private/var/radmind/cert/cert.pem -noout -checkend 0

Since you can do it for one client certificate, you could also loop through all of the certificates on a Radmind server. In this example, I’ll continue to use the path of /var/radmind even though, on Mac OS X, I’d generally prefer to specify the full /private/var/radmind; your Radmind server may not be on Mac OS X even if your clients are. Also, you may need to modify the “depth” parameter on your search to accommodate the paths on your server. Finally, I’ll change the “checkend” parameter to 604800, for seven days (60*60*24*7=604800). That produces something along the lines of:

for CERT in `find /var/radmind/special -name cert.pem -print -type f -depth 6`;
do
    openssl x509 -in "/private/var/radmind/$CERT" -noout -checkend 604800
         RESULT=$?
    case "$RESULT" in
     0)
          echo "$CERT: okay"
          ;;
     *)
          echo "$CERT: expiring"
          ;;
esac
done

Change the last line to “done | grep expiring” if you only want to see the expiring certificates.

It’s great to get just the CN of the certificate in these circumstances, since it’s likely you’ll want to act on just those that need attention. One way to do this relatively cleanly is to use OpenSSL x509’s “subject” and “nameopt” options, and then parse the output. Below, I’ll use awk for that. (Again, substitute one of your own managed clients’ hostnames for “hostname” in the path below.)

$ openssl x509 -in /private/var/radmind/special/hostname/private/var/radmind/cert/cert.pem -noout -subject -nameopt sep_multiline | awk '/CN/ {split($1,elements,"=") ; print elements[2] ;}'

Beyond checking for expiration on the server, it may be valuable to do so in your Radmind client scripts, especially if you favor SSL connections. If you find an expired certificate, you can take some remedial action right away that might allow the client to communicate with the server.

I thought about this a while, and the easiest way I came up with — after having already developed more complex logic — was to simply rename or remove the expired certificate from its normal path. Then, allow the client to connect with another authorization level where the client certificate is unnecessary. (Use of a client certificate implies Ramind’s “-w2” authorization level, while a lesser level would mean you’re performing hostname/DNS rather than certificate verification.) This would probably mean you have multiple Radmind server processes running, each on its own port, to accept such incoming requests on the server.

An Epic Introduction to PyObjC and Cocoa

Irrational Exuberance has An Epic Introduction to PyObjC and Cocoa. Skimming over it, it really does seem to start at the ground level — and that’s what I need. I have high hopes it will help me figure out PyObjC.

[Via PyObjC Development mailing list.]

Syndicate content