There may be times when you want to obtain the number of the latest available version — not just the latest installed version — of a software package through automated means. If the vendor or project provides a syndication feed (either RSS or Atom) that describes new releases, then you may be able to parse that data and get the newest release from it.
As an example, let’s examine the RSS feed for Group Logic’s ExtremeZ-IP. Other developers provide RSS/Atom feeds for their releases, but the EZIP feed is a good one to start a demonstration with because it is generally structured well.
We can break apart the EZIP feed with the Universal Feed Parser module for Python, which you must obtain separately.
Update: Because of Mark Pilgrim situation (also described here), the Universal Feed Parser Web site is no longer available. There is an alternative source for the Universal Feed Parser at Google Code, and I have cloned the Universal Feed Parser repository to Bitbucket from there.
As you can see, the “ExtremeZ-IP Latest Releases” feed is automatically recognized as RSS 2.0. I prefer to use HTTPS for fetching these feeds whenever possible, so if the developer has an HTTP feed, I try to see if it also works with HTTPS.
Next, let’s ﬁnd out where the version numbers are kept in the feed. It looks like they are in the entry title, based on reading the feed in Safari RSS. I can conﬁrm that with the Universal Feed Parser. We’ll want to examine the title of every feed item so we can better handle both current and future entries from the feed. There are more entries to the EZIP than I will print out.
We get Unicode strings as output from the Universal Feed Parser. That’s why the quoted strings are preceded with a “u” character.
I’d like to strip out the version string from the title element in each entry. I’m going to do so by splitting on whitespace and getting the last group of characters from the string. (This doesn’t account for text like “Hot Fix,” as seen in the EZIP feed, but it is still a good enough starting point for my purposes.)
By stripping out the build number after “x” in the version string, you potentially lose some data. In the EZIP feed, there are entries where two consecutive version numbers are the same except for the build number after the “x.” However, depending on your needs, it may still be useful to eliminate that part of the version string, so we’ll do that next.
We really only need the most current or “top” item in the feed, since that should give us the newest release number. The newest version in this particular feed should be in the ﬁrst entry. That’s “entries” below, because we’re using Python and it zero references the ﬁrst item in lists.
There, we now have the version number of the most current release, 7.1.1, for the product. We have drawn it straight from the developer’s syndicated feed, so it is as current as the developer makes it.
How could that be useful? The output can be compared against other data, like the currently-installed version. The comparison, in turn, could be made part of a monitoring workﬂow, so you could get alerts if you fall behind.
If we didn’t strip the build number after the “x,” we would be left with a complex version number. Some Python tools, like distutils, will not currently handle the trailing characters in the version number well.
I have found that you can improve upon distutils’ StrictVersion/LooseVersion version number handling by switching to parse_version in pkg_rsources (“from pkg_resources import parse_version as V”). More coverage of that topic appears in PEP 386. If you are comparing the original version strings from the EZIP feed with similarly complex output from elsewhere, then I would probably use the pkg_resources module.