Web

Parse a vendor RSS feed to get the latest available product version

There may be times when you want to obtain the number of the latest available version — not just the latest installed version — of a software package through automated means. If the vendor or project provides a syndication feed (either RSS or Atom) that describes new releases, then you may be able to parse that data and get the newest release from it.

As an example, let’s examine the RSS feed for Group Logic’s ExtremeZ-IP. Other developers provide RSS/Atom feeds for their releases, but the EZIP feed is a good one to start a demonstration with because it is generally structured well.

We can break apart the EZIP feed with the Universal Feed Parser module for Python, which you must obtain separately.

Update: Because of Mark Pilgrim situation (also described here), the Universal Feed Parser Web site is no longer available. There is an alternative source for the Universal Feed Parser at Google Code, and I have cloned the Universal Feed Parser repository to Bitbucket from there.

import feedparser
ezip_feed = feedparser.parse(‘https://www.grouplogic.com/ezipreleases.xml’)
ezip_feed[‘feed’][‘title’]
u‘ExtremeZ-IP Latest Releases’
ezip_feed.version
‘rss20’

As you can see, the “ExtremeZ-IP Latest Releases” feed is automatically recognized as RSS 2.0. I prefer to use HTTPS for fetching these feeds whenever possible, so if the developer has an HTTP feed, I try to see if it also works with HTTPS.

Next, let’s find out where the version numbers are kept in the feed. It looks like they are in the entry title, based on reading the feed in Safari RSS. I can confirm that with the Universal Feed Parser. We’ll want to examine the title of every feed item so we can better handle both current and future entries from the feed. There are more entries to the EZIP than I will print out.

for entry in ezip_feed.entries:
    entry[‘title’]
u‘ExtremeZ-IP File and Print Server - Version 7.1.1x94’
u‘ExtremeZ-IP File and Print Server - Version 7.1x14’
u‘ExtremeZ-IP File and Print Server - Version 7.0x41’

We get Unicode strings as output from the Universal Feed Parser. That’s why the quoted strings are preceded with a “u” character.

I’d like to strip out the version string from the title element in each entry. I’m going to do so by splitting on whitespace and getting the last group of characters from the string. (This doesn’t account for text like “Hot Fix,” as seen in the EZIP feed, but it is still a good enough starting point for my purposes.)

for entry in ezip_feed.entries:
    entry[‘title’].split()[-1]
u‘7.1.1x94’
u‘7.1x14’
u‘7.0x41’

By stripping out the build number after “x” in the version string, you potentially lose some data. In the EZIP feed, there are entries where two consecutive version numbers are the same except for the build number after the “x.” However, depending on your needs, it may still be useful to eliminate that part of the version string, so we’ll do that next.

for entry in ezip_feed.entries:
    entry[‘title’].split()[-1].partition(‘x’)[0]
u‘7.1.1’
u‘7.1’
u‘7.0’

We really only need the most current or “top” item in the feed, since that should give us the newest release number. The newest version in this particular feed should be in the first entry. That’s “entries[0]” below, because we’re using Python and it zero references the first item in lists.

ezip_current_release = ezip_feed.entries[0].title
ezip_current_release_version = ezip_current_release.split()[-1].split()[-1]
ezip_current_release_version_stripped = ezip_current_release_version.partition(‘x’)[0]
u‘7.1.1’

There, we now have the version number of the most current release, 7.1.1, for the product. We have drawn it straight from the developer’s syndicated feed, so it is as current as the developer makes it.

How could that be useful? The output can be compared against other data, like the currently-installed version. The comparison, in turn, could be made part of a monitoring workflow, so you could get alerts if you fall behind.

If we didn’t strip the build number after the “x,” we would be left with a complex version number. Some Python tools, like distutils, will not currently handle the trailing characters in the version number well.

I have found that you can improve upon distutils’ StrictVersion/LooseVersion version number handling by switching to parse_version in pkg_rsources (“from pkg_resources import parse_version as V”). More coverage of that topic appears in PEP 386. If you are comparing the original version strings from the EZIP feed with similarly complex output from elsewhere, then I would probably use the pkg_resources module.

On winning and my low user ID at Drupal.org

I stumbled into winning a copy of Beginning Drupal 7 by Todd Tomlinson from Apress and the nice hosts of the Drupal Easy podcast.

The announcement on the podcast had me laughing out loud. (Luckily, I was home with a sick child who was sleeping upstairs.) I hope Mike and Andrew don’t mind that I transcribed the bit where they talked about me (starting at 46:24 into the podcast):

Mike: Anyway, [laughs] so we picked the winners earlier today, and they have not gotten back to me yet, but I’m going to say, I’m going to feel pretty good that they will get back to me so I’m going to announce their names, anyway. Jeremy Reichman, who — get this — his user ID on Drupal.org is 2039. Now, that doesn’t seem that small, but I think we’re up over five hundred thousand right now. Is that correct?
Andrew: Yeah, I believe so. It’s a pretty high number now. My number is 98,079.
Mike: Right. So this guy’s way better than you.
Andrew: Yes.
Mike: Way better. But he’s been a user for over nine — nine and half years — on Drupal.org.
Andrew: Wow.
Mike: So I’m not sure what he needs a Beginning Drupal 7 book for but, you know, I would guess it’s probably for his child that he’s had between the time that he started iwth Drupal and now.
Andrew: Although the could still be using Drupal 2.
Mike: Yeah, that’s true.

Indeed, my user ID is 2039. Indeed, I have been a member on Drupal.org for “9 years 37 weeks,” according to my profile. That’s a long time! The low user ID certainly doesn’t make my account better than someone else’s, but it is kind-of fun to have now that Drupal has become so popular.

One possibility they did not raise in that announcement was that I could just have been lurking on Drupal.org for five or six years — biding my time with my increasingly-creaky UserLand Manila Web site — before I decided to take the plunge with Drupal. This may be why my Certified To Rock score is a mere 3.

But perhaps I will share Beginning Drupal 7 with my children when the book arrives. This is not a bad idea, and I already have precedent for it. Our new arrival needs some technical reading material — and I’m going to have to upgrade to Drupal 7 sometime.

Enable the Drupal Token Filter module to insert the current year into a copyright block

In the course of a conversation with Jesse today, I launched into action to fix the copyright statement in my site’s footer block. It still listed the copyright dates as “1999-2009” when the year had clearly rolled over into 2010 some time ago. Ahem.

Drupal 6, to my knowledge, doesn’t have a built-in way to insert a variable into the body of a text field in a node or block. However, it does have the very-powerful Token API, of which I’m a big fan. Tokens let you insert variable data. It’s just not always easy to figure out how a mere mortal — rather than a module developer — can take advantage of tokens in Drupal. (Variable insertion was a key feature of the UserLand Manila CMS I started using in the late 90’s, and I miss that sort of capability.)

In Drupal 5, I had previously used the Copyright module. Copyright module integrated with the Token API so you could use tokens in the block it created. The module was never upgraded for Drupal 6, so my footer copyright block became static text.

I realized that an Input Filter that worked with the Token API would probably let me do what I want, and I found a module that provided that missing link.

Here’s what I did so I could insert a constantly-advancing date variable into my copyright footer block:

  1. Downloaded Token Filter module.
  2. Installed Token Filter module.
  3. Enabled Token Filter module via the Admin Menu: Site Building > Modules > List.
  4. Added Token Filter to my standard Input Filter used across the site.
  5. Changed “2009” in my copyright block to “[site-date-yyyy]” which didn’t work, and then read the Token Filter docs to find out that should have been “2012” which then worked.

Now, my copyright block should always show the current year in the date range, like this: 1999-2012.

Hope someone enjoyed this crash report

As you may know, I absolutely live to submit crash and bug reports to vendors.

Normally, these reports are staid affairs involving copious amounts of detail, exquisite reproduction steps, and both expected and actual results. I’m sure that these kinds of reports get a certain amount of attention from software developers. I say that with confidence because at least one person in that position has told me my bug reports are his favorite. Sure, it was a few years ago now, but I’m sure I’ve still got some mojo.

Occasionally, I get the opportunity to take some creative license with these submissions. For example, there was that one involving the hockey game. The standard was raised after I saw my current all-time favorite crash report.

So, when I had a particularly frustrating crash in Safari, I decided I needed to raise my game. After all, I’ve probably submitted 33 boring crash reports on Safari already. Here’s what I came up with this time:

macosx-workstation-snowleopard-10.6-10F569-TpdVXr-20100915-105506.png

For those of you reading with Lynx, the text of the submission was:

Safari was launched by me. It had tabs. The set of tabs evolved. They were created to help me. There were so many of them there may have been copies. And I had a plan for them.

I hope someone enjoyed this crash report. And, for all I know, they fixed the problem in Safari 5.0.3.

Run Drupal cron with Drush at Site5

After starting to use Drush, I wanted to switch my Drupal cron jobs over to it. I’d previously been running these jobs the standard way, loading a URL for each of my Drupal sites with curl. That curl command was run by the cron; Site5 allows editing of your crontab directly or through its Backstage Web interface.

I started with the basics. Drush was installed in my home directory, with an alias under ~/bin in my Site5 account. I replaced my curl command one-for-one with the following:

$ /home/your_site5_account/bin/drush \
—quiet \
—root=/home/your_site5_account/public_html/your_drupal_site/ \
cron

This didn’t work out well. I was getting a huge number of mail messages, indicating problems with the cron jobs. The messages typically contained “tput: No value for $TERM and no -T specified.” Needless to say, this was rather frustrating, so after some trial and error, I modified the command as follows, and that has been working better for me.

$ /usr/local/bin/php \
-d memory_limit=64M \
/home/your_site5_account/drush/drush.php \
—quiet \
—root=/home/your_site5_account/public_html/your_drupal_site/ \
cron

The biggest change is that I specify PHP, rather than Drush, directly. This was done so that I could increase the PHP memory limit for the cron job significantly, to 64M, while running Drush. I’m sure that this increase was needed partly due to the number of modules I have installed on my main site (which has an even larger default PHP memory limit in its php.ini). My research indicated that I needed to do this for Drush since it doesn’t access the Drupal sites in the same way loading a page does.

The other noticeable change is that I provide the path to the drush.php file rather than pointing to the Drush alias.

McAfee DAT update 5958 as trending topic on Twitter

The McAfee DAT update 5958 was issued on April 21, 2010, and created quite a situation. Heretofore, I will remember what transpired as “the events of April 21.”

I think that someday, examining what happened would make an interesting case study in crisis management. A lot of the incident unfolded on the Internet — and on Twitter, specifically. The company even became a trending topic, as seen in this screenshot I took after lunchtime (I think around 2 PM Eastern time, although I only saved later):

The 5958 DAT was available on McAfee’s publicly-accessible HTTP and FTP download repositories until at least 1 PM Eastern, when I was checking on them.

The Windows and Mac anti-malware products from McAfee share DAT updates, which provide virus definitions. I was able to update VirusScan for Mac OS X to 5958 with no ill effects in the midst of the developing situation. (The problem only appears to have affected Windows XP systems.) Later, when McAfee had posted a newer update as version 5959, I was also able to download that.

Based on reports I saw on Twitter and the Web, McAfee was overwhelmed by this — particularly its call center and its Web-based customer forums. This allowed a lot of speculation and misinformation — along with humor — to break out.

I’ve saved this undoctored screen shot for a while. I figure I’ll end with it, even though it’s unrelated to the events of April 21.

A case for Web hosts to offer search as a service

Apache Solr provides a Web service front end for the Apache Lucene indexing and search engine library. Both Solr and Lucene (upon which Solr depends) are Java-based, which has implications for shared Web hosting.

Drupal is an open source CMS, and I happen to use it on a shared Web hosting provider as of this writing. Drupal is gaining support for Apache Solr through a module that has had a lot of input from Acquia (the “Red Hat” of Drupal).

Dries Buytaert of Acquia has some interesting perspective on search for the Web and CMSes in some recent articles on his site. Specifically, he talks about Acquia Search, a Solr-based search service that is being offered to Drupal sites on the Acquia Network. He discusses the advantages afforded by good search capabilities for both visitors to a Drupal Web site and for site administrators.

I’ve used Acquia Search (in beta), and it has been great. It’s very fast compared to the core Drupal Search module. The ability to perform faceted searches, word stemming, spell checking, and more is all tremendous. (You can see it in action in the search field in the site sidebar, as long as my Acquia Network subscription from the beta lasts.)

But Acquia Search part of a larger service offering — the Acquia Network — which ultimately makes it too expensive for me on my personal sites. It’s priced out of reach for me — more costly for one year than two years of Web hosting, domain registrations, and separate e-mail hosting for my domains are today. I think it’s clear that Acquia is aiming at a different market, and that’s fine.

My idle thought, however, is that search by itself is a compelling feature even for small Web sites like mine. It’s as compelling as hosting files, like HTML or PHP or images, or serving databases, like MySQL and PostgreSQL.

If it’s important as Dries notes in his posts — that the search market is so large and growing, and of such universal importance — then great search is a compelling feature to have for many levels of sites. After serving the files and serving the database, it may be the next big service that a Web hosting provider could offer. And today, Web hosting offers a range of pricing (and service levels) to meet various needs.

I could see advertising that for some monthly fee, a Web host offers 55 GB of storage and 550 GB of monthly data transfer and unlimited MySQL databases — and oh, by the way, some reasonable level of indexing/search with Apache Solr and/or Sphinx or whatever. Although I hate to suggest it, search could even be an optional add-on, as many providers treat dedicated IP addresses or SSL or the like.

There may be an additional win, in that separate servers could be optimized for search to offload that processing from the Web server. It could even be something that a Web host contracts out or partners with another to provide — maybe even with a company like Acquia that’s already set up their infrastructure to scale on Amazon EC2.

Especially if other CMSes, such as WordPress, get Solr integration — as with this WordPress Solr plugin — then the case for Web hosts offering something like Solr search gets convincing.

OpenID delegation and Drupal accounts

I discovered — after I’d set up OpenID delegation (using the Drupal OpenID URL module and Sam Ruby’s instructions) — that each OpenID used with a Drupal site needs to be associated with a Drupal account.

Therefore, even though OpenID delegation may point to a previously-associated provider, such as Verisign Labs’ Personal Identity Portal, or PIP, it acts as its own identity. The delegating URL is a URL in its own right, so this makes some sense even if it is not convenient when you set up delegation after starting to use other OpenIDs.

I had to teach each of my Drupal accounts on various sites that I wanted to use my own URL in addition to any previously-associated OpenIDs.

Considering Frontier DSL

After the news hit about Time Warner Cable’s intent to charge different rates for tiers of monthly data transfer — and an enormous $1/GB fee for overages — it seems eminently sane to consider the competition.

In Rochester, that competition is Frontier DSL. For a long time, that basically meant there was no competition, I’m very sorry to say.

However, the changes to TWC’s fee structure may be so extreme that even that level of competition is good. While I don’t think our household monthly data transfer is excessive, I’m reasonably sure (based on what I’ve seen from the data I’ve collected from our broadband router) that we’ll blow right past the 5 GB/month tier and maybe the 10 GB/month one. We would have to — and by that I mean, I would have to, really — develop some more austere usage of the family Internet connection that we’re accustomed to. Thus, I’m examining the pro and con positions for Frontier’s high-speed Internet service.

Pro

With Frontier DSL, my family should:

  • Not have to deal with the stress of the upcoming 5/10/40/100 GB-per-month tiers from TWC, which will reportedly take effect in Rochester in November 2009
  • Get to send a clear message to TWC that metered Internet access is a terrible idea
  • Get the peace of mind that unlimited, unmetered Internet access provides — but only if Frontier’s existing 5 GB-per-month transfer cap is eliminated
  • Benefit from the refer-a-friend affiliate program — both parties get the $20 referral credit
  • Be able to combine the billing with our Plain Old Telephone Service (POTS).

Con

However, there are some drawbacks to Frontier DSL. My family would be concerned about:

  • A potential monthly DSL modem rental fee
  • A two-year commitment with a $200-300 (I’ve see both figures!) early termination fee
  • The 5 GB-per-month unenforced data transfer cap (but that cap may be dropped entirely in an effort to better compete with TWC)
  • Blocking port 25 — while there are workarounds, this is just aggravating
  • The unknown quantity of Frontier’s technical support, whereas TWC’s has been reasonably good over the years.

Anyway, while we’re mulling this over, the news is playing out on sites like StopTheCap and StopTWC! Meanwhile, I’m more than a little annoyed at the traditional news media avoiding some of the other angles surrounding this topic — the pricing change as a way to protect cable television revenues, the local monopoly (and how cable infrastructure compares to its telephone equivalent), the impact on increasingly Internet-dependent households during a recession, how this might change the habits of people (including employees working at home), and so on.

Linked on installers

John C. Welch’s article, On Installers, is linked from Daring Fireball today. He links to me — thank you very much John, for that and for the kind words about my signal-to-noise ratio (whatever my front page says on that score right now) — placing me one jump away from Daring Fireball.

I was a little worried about that until I checked my Web analytics account. Luckily, my link is third to last and at the end of a long article, or my Web host might be having words with me about traffic.

It was very good to be mentioned, and even be situated in auspicious company between Greg and Nigel. All of us are current or former Radmind admins, and as a group I think Radmind admins tend to know a bit about the foibles of vendors’ installers. Along its famous learning curve, Radmind teaches you a lot about the filesystem and about what’s going into it.

Anyway, for anyone completely new here, you can follow the Mac OS X system administration topic on its own — and skip others, like random Python, Mercurial, Western New York sports, Drupal, and personal chatter.

Syndicate content