I was trying to install Dulwich on a Mac OS X Lion system this week and ran into difficulty. I kept getting installation failures that included a missing “python.h” and, eventually, llvm-gcc-4.2 failed to compile the module.
I found the situation frustrating, partly because I pretty much own the top search hits about how to install Dulwich and Hg-Git on Mac OS X Lion, thanks to some earlier article.
It turns out that I had reinstalled Lion about two weeks ago, and had not reinstalled Xcode 4. So, I updated to Xcode 4.2 and this completely eliminated my problem. Presumably, it would also work for you — and for future me, since I’m likely to repeat this — even if Lion hadn’t been reinstalled in between.
Things that didn’t work included but were not limited to:
I had a need to read some settings from a Python script on Mac OS X recently. I wanted to be able to change selected parameters for the script — some of which could be site or implementation specific — without embedding them directly in the code. Since the script was Mac OS X-only, using a property list seemed like a good idea.
With customizable settings from a property list, a script could become useful and more customizable for a wider community — or even different internal audiences.
I thought reading preferences was going to take a lot of effort. I was, however, pleasantly surprised at how easily I was able to accomplish it.
Asking others who had been down this route before resulted in some links to Apple docs. I wanted a more concrete example of how it was done in Python, and I got a great one.
That example came from reading the source to munkilib from the open source project, Munki. Since Munki had its own preference file, it needed to read from it, and its example was very enlightening.
Frogor directed me to a specific spot in the Munki code. That spot demonstrated how to read a preferences file with CoreFoundation.
Munki also has its own internal defaults for preferences. The munkilib/munkicommon.py example showed how to implement default settings in their absence in a property list. That provides a fallback position so that you always have some value available. In Munki, setting the defaults was done within a function. It seemed that it would be more generic if those defaults were separated out of the preference-reading function.
My own example of how to read a plist is outlined below. This focuses on just what you need in order to read the preferences and provide default settings for a larger script. A single set of preferences can be shared between scripts, and each script can encode its set own defaults. While you can create your own keys and values, for the example below, I will use the keys “StringPreference,” “BooleanPreference,” “ArrayPreference,” and “DictionaryPreference.”
The script won’t do anything yet, until we create a main() or other functions to call the get_preferences function. Since you’d be reading preferences as part of a larger script, this is fine for now. However, with what’s written already, you can test out reading preferences interactively with the shell and Python interpreter.
That’s it, the expected results were returned. Notice that the output for StringPreference and BooleanPreference are taken from the property list, while the empty array and dictionary come from the default_preferences in the script.
Update: Greg pointed out that there is a certain degree of danger using #!/usr/bin/env python as the shebang line in the script. Should a system have a non-Apple installation of Python, then /usr/bin/env python might return that. Another Python is probably less likely to be able to bridge CoreFoundation, and thus wouldn’t be able to use the CFPreferencesCopyAppValue call to read preferences.
I think the overall shebang danger is small because few Mac OS X systems will have an alternative Python installed, but clearly your chances of that increase in certain situations. To eliminate this risk, do what Munki does and insert the #!/usr/bin/python shebang instead.
Hg-Git is the Mercurial extension to use if you want to connect to local or remote Git repositories. I exclusively use Mercurial and Hg-Git for all of my Github transactions, so I can personally vouch that it works.
Now that Hg-Git has been updated to better support Mercurial 1.9, let’s see if we can get an Hg toolchain working on Lion. Since I did that on Snow Leopard a few days ago, Hg-Git has made it into PyPI. The installation instructions this time are a bit more streamlined, because we can now use easy_install to get Hg-Git and its dependencies.
To get the toolchain set up, we’ll need Xcode. The Xcode suite includes tools we’ll need to make Python easy_install work, along with Subversion (a prerequisite for Hgsubversion, which I’ll talk about in a later article) and other useful tools.
The Xcode installation is a multi-step install process. Both current download methods — the developer download through connect.apple.com (if you have a paid Mac Developer Account) and the Mac App Store — give you an “Install Xcode” application. That application runs a second, real installer that you have to finish before you actually have the Xcode tools available in a ready-to-use state. This is very similar to the situation for Mac OS X Lion, so you may be developing a sense of familiarity with the situation.
To install Mercurial:
To add Hg-Git to Mercurial on Lion:
That’s it! Once Mercurial 1.9 plus Hg-Git 0.3.1 or later are installed and you’ve enabled Hg-Git in your ~/.hgrc, you are ready to use Mercurial with local and remote Git repositories.
Here is my current installation method for Mercurial 1.9 and Hg-Git on Mac OS X Snow Leopard. I use Hg-Git on a sporadic basis to work with projects on Github. That activity is frequent enough that it’s really helpful to be able to get Hg-Git set up and working quickly.
I had a rough time with Hg-Git after I upgraded to Mercurial 1.9 a few weeks ago. It was notable because that was the first time I can remember having incompatibilities between this stack of software. Since then, Hg-Git has changes available that allow it to install and work with newer versions of Mercurial, including version 1.9, and the most recent Dulwich release. These changes weren’t as readily available at the time I upgraded, so I thought it was worth writing this to remind me how to put the right pieces together.
I don’t yet have a working installation method for all of this software on Mac OS X Lion. I ran into a stumbling block getting Dulwich installed there, and Dulwich is a necessity for Hg-Git. Update: The situation changed a few days later, and I have instructions for Mac OS X Lion available now.
This may have been a problem isolated to the particular Lion system I was using. I haven’t had the chance to duplicate it elsewhere or troubleshoot the problem on the system where I encountered it.
Update: Since I wrote this, Hg-Git 0.3.1 became available through PyPI, which means you can install it via easy_install. You no longer need to clone the Hg-Git repository and run the setup process manually. Instead, you can run the following single command to get Hg-Git 0.3.1 and its dependencies (Dulwich 0.8.0 or later):
This simplifies the instructions quite a bit, but I leave the original steps above for reference.
There may be times when you want to obtain the number of the latest available version — not just the latest installed version — of a software package through automated means. If the vendor or project provides a syndication feed (either RSS or Atom) that describes new releases, then you may be able to parse that data and get the newest release from it.
As an example, let’s examine the RSS feed for Group Logic’s ExtremeZ-IP. Other developers provide RSS/Atom feeds for their releases, but the EZIP feed is a good one to start a demonstration with because it is generally structured well.
We can break apart the EZIP feed with the Universal Feed Parser module for Python, which you must obtain separately.
Update: Because of Mark Pilgrim situation (also described here), the Universal Feed Parser Web site is no longer available. There is an alternative source for the Universal Feed Parser at Google Code, and I have cloned the Universal Feed Parser repository to Bitbucket from there.
As you can see, the “ExtremeZ-IP Latest Releases” feed is automatically recognized as RSS 2.0. I prefer to use HTTPS for fetching these feeds whenever possible, so if the developer has an HTTP feed, I try to see if it also works with HTTPS.
Next, let’s find out where the version numbers are kept in the feed. It looks like they are in the entry title, based on reading the feed in Safari RSS. I can confirm that with the Universal Feed Parser. We’ll want to examine the title of every feed item so we can better handle both current and future entries from the feed. There are more entries to the EZIP than I will print out.
We get Unicode strings as output from the Universal Feed Parser. That’s why the quoted strings are preceded with a “u” character.
I’d like to strip out the version string from the title element in each entry. I’m going to do so by splitting on whitespace and getting the last group of characters from the string. (This doesn’t account for text like “Hot Fix,” as seen in the EZIP feed, but it is still a good enough starting point for my purposes.)
By stripping out the build number after “x” in the version string, you potentially lose some data. In the EZIP feed, there are entries where two consecutive version numbers are the same except for the build number after the “x.” However, depending on your needs, it may still be useful to eliminate that part of the version string, so we’ll do that next.
We really only need the most current or “top” item in the feed, since that should give us the newest release number. The newest version in this particular feed should be in the first entry. That’s “entries[0]” below, because we’re using Python and it zero references the first item in lists.
There, we now have the version number of the most current release, 7.1.1, for the product. We have drawn it straight from the developer’s syndicated feed, so it is as current as the developer makes it.
How could that be useful? The output can be compared against other data, like the currently-installed version. The comparison, in turn, could be made part of a monitoring workflow, so you could get alerts if you fall behind.
If we didn’t strip the build number after the “x,” we would be left with a complex version number. Some Python tools, like distutils, will not currently handle the trailing characters in the version number well.
I have found that you can improve upon distutils’ StrictVersion/LooseVersion version number handling by switching to parse_version in pkg_rsources (“from pkg_resources import parse_version as V”). More coverage of that topic appears in PEP 386. If you are comparing the original version strings from the EZIP feed with similarly complex output from elsewhere, then I would probably use the pkg_resources module.
I came across this hint about display properties on StackOverflow and thought it was worthwhile to write down for later. If you want to get the screen or Desktop resolution of a Mac via Python, you can do so with PyObjC.
First, let’s get the information about the main screen:
If you want just the horizontal and vertical resolution from that blob of data, you can pull the width and height out:
This might be useful in situations where you don’t have any of the “hundred of portable libs in Python that give you access to that information” — such as in your stock Mac OS X Python installation. To clarify: I’m in no way meaning to belittle that there are portable libraries that would let you do the same thing, but you also have to program for your audience and its constraints. One of the reasons I appreciate Python over some scripting languages is that you get so much capability in the Standard Library. However, on Mac OS X, you don’t get modules like pygame by default (yet … and maybe never) while you do get PyObjC.
Jesper Noehr explains why {l,r}strip are considered harmful for removing extensions from filenames with Python. I think he’s absolutely right on that score, and I would agree. The lstrip() and rstrip() methods shouldn’t be used for this purpose.
However, like the only commenter on that post, I’d also recommend os.path.splitext() as the proper tool for the extension-removing job.
Let’s take some example filenames you might come across on Mac OS X Snow Leopard:
If we had a list of filenames (or file paths) like this — perhaps created by os.walk() or some other generator-based process — we couldn’t easily use Jesper’s recommended solution. The replace() string method would give us a much harder time dealing with the range of filenames and extensions in that list. In the case where you don’t know the filename extensions in advance, replace() breaks down. The replace() method would have to be looped with many possible filename extensions.
What we need is a way to split filename from extension, even if we don’t know the extension beforehand. The os.path.splitext() alternative does just that, returning a tuple. Here, I’ll import the os module and then use a list comprehension to run os.path.splitext() through every filename in the list above.
It becomes a simple matter to get just the filename from the tuple, as I do here by modifying the list comprehension to just get the zeroth item from it:
Note that several interesting conditions are handled by os.path.splitext():
Mac OS X Snow Leopard does not include Core Graphics bindings (CGBindings) for 64-bit Python.
The SWIG-based Python CGBindings originally shipped with Mac OS X 10.3, which bundled Python 2.3. Since that time, these bindings — specific to the system’s bundled framework build of Python — had allowed access to Core Graphics objects and commands from within scripts.
They were one of the reasons I decided to use Python in the first place. I thought they would be fun to learn and use, particularly with the then-new PDF Services feature of Mac OS X. The Core Graphics bindings also provided much, much more power than the command line sips tool and had an advantage over other alternatives by being bundled with the operating system. I thought they offered the possibility of growing with Mac OS X’s graphics hardware acceleration. I even found a way to use them to create better screenshots with drop shadows, a task where I’d previously employed Ambrosia’s Snapz Pro X.
Here’s an example of what you’ll see on Snow Leopard if you try to “import CoreGraphics” in 64-bit Python:
With 32-bit Python on Snow Leopard:
While the CGBindings are still available to 32-bit Python in Snow Leopard, you must use PyObjC to replace their functionality for 64-bit Python. Since 64-bit Python is the default in Snow Leopard, it makes sense to transition from the bindings to PyObjC as soon as possible. This means there is some porting work for scripts that used the Core Graphics bindings. I guess I’m glad I didn’t do as much with them as I’d planned.
I see this change as something of a loss. (Is this what Carbon developers are experiencing? Hm.) The Core Graphics bindings were relatively easy to use and felt reasonably Pythonic, even if the documentation was almost nonexistent. PyObjC feels more foreign to me when I attempt to use it — even though it’s clearly the future.
The default installation of Python on Mac OS X Snow Leopard is version 2.6.1. According to the man page for Python on Snow Leopard, Python 2.6 executes as a 64-bit application by default.
If, for some reason, you need to run it as a 32-bit application, this can be changed at the command line:
The preference can be set in either the User or Local filesystem domain in Mac OS X, following the normal precedence rules. To unset it, presumably you would change the boolean to “no” — or perhaps even delete the “Prefer-32-Bit” key.
There is also an environment variable that can override this preference.
A question came up on the AppleScript-Users mailing list and I wanted to make note if it, because I came up with a quick Python-based way to answer it. Here, I reiterate that answer on how to get the number of pages from a PDF.
This Python 2.5 sequence (shown as run interactively from the Terminal on Leopard — not as a script) works for me using both a local file and a file from an AFP-mounted volume. It uses the Quartz 2D bindings for Python, which are part of the Python install on Mac OS X (since 10.3?), so it is not pure Python.
Substitute your own path for the pdf_filename variable’s contents, and you can try it yourself. (Above, $ is the shell prompt, and >>> is the Python interpreter’s command prompt, so don’t copy/paste those.)
Since I was doing this in Terminal, I could also type “pdf_filename = ‘” then drag and drop a file from the Finder into Terminal window to insert its path, complete the quoting, and pressed Return. (Saves some time and potential for mistyping.)
You could wrap this sequence into one longish command line and run it from AppleScript with “do shell script.” Or you could save it as a Python script file and again run it with “do shell script.” Or, you could skip AppleScript and just use Python, which I prefer over AppleScript. (Python seemed very natural to me with my minor AppleScript background, and I’m pretty sure I’ve written more of it than AppleScript by this point.)
This is derived from “Example 2: Splitting a PDF File” in the Using Python with Quartz 2D on Mac OS X document at the Apple Developer Connection.
That ADC example also shows how you could specify a PDF file’s path at the command line (with sys.argv), rather than hardcoding it as I did for my example.
There may be a shorter way to do this since I’m no expert on Quartz 2D. (I just appreciate that the bindings are there for a scripting language I happen to like a lot.) I honestly don’t know what’s happening when the provider and pdf objects are being set, but I really don’t need to know for this.