Drupal

Updating Acquia Drupal versions in a Mercurial repository

After using Acquia Drupal for a while, I took advantage of a trial subscription to the Acquia Network. The network’s services showed me that I had files present in my install that the agent could not account for.

I suspected this was happening because of the way I manage my Acquia Drupal installation with Mercurial. So, I’ve modified my previous process (and updated my instructions) to extract the downloaded tar archive with the --recursive-unlink option. This option appears to successfully remove the contents of every directory before putting new files back into them.

$ tar --strip-path=1 --directory=acquia_drupal --recursive-unlink -zxvf acquia-drupal-1.2.12.5047.tar.gz
acquia-drupal-1.2.8/
acquia-drupal-1.2.8/robots.txt
...
acquia-drupal-1.2.8/INSTALL.txt

When the archive is extracted in this way, my repository’s working directory shows modified, unknown, and deleted files. This allows me to treat each category of files individually before I commit the changes for a Drupal update as a revision.

$ hg status

The modified files will be tracked normally because they’ve already been added to the Mercurial repository, so I don’t need to do anything special for them.

The unknown files are ones that are completely new, and have not appeared in the same position in a previous revision. They have yet to be tracked by Mercurial, so I have to add them to the repository. To add just those unknown files, then, I have to pick them out from the status listing:

$ hg status --unknown

In order to operate just on those files to add them to the repository, I run a for loop:

$ for FILEPATH in `hg status --unknown --no-status`
for> do
for>    hg add "$FILEPATH"
for> done

This changes the “?” status to “A,” because the files were successfully being tracked by Mercurial.
I use the “--no-status” flag on the “status” command so that just the file paths are printed; the actual status code is not, which is appropriate for the target of the “add” command in the loop.

I do the same basic steps with deleted files. These are files that were in the previous revisions but have been deleted by the --recursive-unlink option from the tar extraction and not replaced with the extraction of the new Acquia Drupal tar archive. If the deleted files had been replaced by the tar extraction, they would either be unchanged (which would not show up in the “status” output) or marked as modified.

To remove the files that are marked as deleted from the repository’s working directory:

$ for FILEPATH in `hg status --deleted --no-status`
for> do
for>    hg remove "$FILEPATH"
for> done

However, that may be the same as simply using the following, which I have to explore further:

$ hg remove --after

So, to follow all of these changes in the repository, I run the loop for the uknown files and the loop for the deleted files. The modified files are already tracked, so I don’t need to do anything additional for them. After that, a “commit” will record all of the changes — modifications, additions, and deletions — in the repo.

These commands are based on my current understanding of Mercurial, and they do work for me right now. There could certainly be another better way to do this in one fell swoop — or at least fewer steps. I would welcome that, so if you’re aware of a way, feel free to comment or contact me.

Update: I found that the “hg addremove” command cleanly replaces all of the shell loops I mentioned above. Therefore, I recommend using it instead of the “for” loops I described.

A case for Web hosts to offer search as a service

Apache Solr provides a Web service front end for the Apache Lucene indexing and search engine library. Both Solr and Lucene (upon which Solr depends) are Java-based, which has implications for shared Web hosting.

Drupal is an open source CMS, and I happen to use it on a shared Web hosting provider as of this writing. Drupal is gaining support for Apache Solr through a module that has had a lot of input from Acquia (the “Red Hat” of Drupal).

Dries Buytaert of Acquia has some interesting perspective on search for the Web and CMSes in some recent articles on his site. Specifically, he talks about Acquia Search, a Solr-based search service that is being offered to Drupal sites on the Acquia Network. He discusses the advantages afforded by good search capabilities for both visitors to a Drupal Web site and for site administrators.

I’ve used Acquia Search (in beta), and it has been great. It’s very fast compared to the core Drupal Search module. The ability to perform faceted searches, word stemming, spell checking, and more is all tremendous. (You can see it in action in the search field in the site sidebar, as long as my Acquia Network subscription from the beta lasts.)

But Acquia Search part of a larger service offering — the Acquia Network — which ultimately makes it too expensive for me on my personal sites. It’s priced out of reach for me — more costly for one year than two years of Web hosting, domain registrations, and separate e-mail hosting for my domains are today. I think it’s clear that Acquia is aiming at a different market, and that’s fine.

My idle thought, however, is that search by itself is a compelling feature even for small Web sites like mine. It’s as compelling as hosting files, like HTML or PHP or images, or serving databases, like MySQL and PostgreSQL.

If it’s important as Dries notes in his posts — that the search market is so large and growing, and of such universal importance — then great search is a compelling feature to have for many levels of sites. After serving the files and serving the database, it may be the next big service that a Web hosting provider could offer. And today, Web hosting offers a range of pricing (and service levels) to meet various needs.

I could see advertising that for some monthly fee, a Web host offers 55 GB of storage and 550 GB of monthly data transfer and unlimited MySQL databases — and oh, by the way, some reasonable level of indexing/search with Apache Solr and/or Sphinx or whatever. Although I hate to suggest it, search could even be an optional add-on, as many providers treat dedicated IP addresses or SSL or the like.

There may be an additional win, in that separate servers could be optimized for search to offload that processing from the Web server. It could even be something that a Web host contracts out or partners with another to provide — maybe even with a company like Acquia that’s already set up their infrastructure to scale on Amazon EC2.

Especially if other CMSes, such as WordPress, get Solr integration — as with this WordPress Solr plugin — then the case for Web hosts offering something like Solr search gets convincing.

The dummy domain and the domain pointer at Site5

I’ve been struggling with my Site5 Web hosting account for two years. In many respects, it has been great — good service at a price I was willing to pay. However, my biggest single aggravation has been that there is a primary domain name associated with the hosting account, and that primary domain could not be hosted in a subdirectory of the account cleanly. URL rewriting in an .htaccess file had been my workaround for a long time, but it never really did everything I wanted — and it was complicated.

The hosting account was up for renewal today. I had a decision to make: keep the account and the hassle, keep the account and find a solution to the hassle, or back up my data and move on.

I’m happy to say that I’m keeping the account and it appears that all of the frustration regarding the subdirectory has been eliminated.

The CEO of Site5 helped me out, after seeing my complaints on Twitter. He suggested that I request changing the primary domain to a dummy, nonexistent domain name. With that done, I could then create a domain pointer for my former primary domain (this one, actually), linking it to a subfolder of my account’s public_html directory.

I made the support request, which was fulfilled promptly. The change was made and it works.

Now, it looks like some issues I’d been having have cleared up. Namely, the Global Redirect module I use in Drupal correctly redirects from URLs like /node/300 to their human-friendly paths.

Check a new version of Acquia Drupal into a Mercurial repository

Here is a sequence of commands and output that show how I keep the Acquia Drupal open source content management system up to date with Mercurial, the open source distributed version control system.

In the example below, my Mercurial repositories for Drupal are located in the “drupal” subdirectory of my “repo” folder. Once I’ve moved into that directory, I download the Acquia Drupal distribution with curl and then extract it into my previously-created Mercurial working directory, “acquia_drupal,” using tar.

$ cd repo/drupal
$ curl -O http://acquia.com/files/downloads/acquia-drupal-1.2.0.3780.tar.gz
$ tar --strip-path=1 --directory=acquia_drupal --recursive-unlink -zxvf acquia-drupal-1.2.0.3780.tar.gz

(Update: I added the --recursive-unlink option after I noticed that the Acquia Network control panel keeps track of extra — possibly unneeded — files and folders you have in your install. The recursive unlink option seems to avoid having stray files from old versions of modules hanging around in your repository after you install updates.)

After extracting Acquia Drupal my Mercurial working directory, I get the status of the repository. It shows there are changes from the last version I checked in — and this includes new files, denoted by a “?” at the beginning of their line.

$ cd acquia_drupal
$ hg status
M profiles/acquia/acquia.profile
? modules/acquia/acquia_connector/README.txt
? modules/acquia/acquia_connector/acquia_agent/acquia.ico
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.info
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.install
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.module
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.pages.inc
? modules/acquia/acquia_connector/acquia_agent/acquia_agent_drupal_version.inc
? modules/acquia/acquia_connector/acquia_agent/acquia_agent_streams.inc
? modules/acquia/acquia_connector/acquia_spi/acquia_spi.info
? modules/acquia/acquia_connector/acquia_spi/acquia_spi.install
? modules/acquia/acquia_connector/acquia_spi/acquia_spi.module

Since there are new files, I have to add them so they’ll be tracked by the repository. I only need to add in the parent directory for any changed files, and any new files within it will also be added for tracking.

$ hg add modules/acquia/acquia_connector
adding modules/acquia/acquia_connector/README.txt
adding modules/acquia/acquia_connector/acquia_agent/acquia.ico
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.info
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.install
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.module
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.pages.inc
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent_drupal_version.inc
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent_streams.inc
adding modules/acquia/acquia_connector/acquia_spi/acquia_spi.info
adding modules/acquia/acquia_connector/acquia_spi/acquia_spi.install
adding modules/acquia/acquia_connector/acquia_spi/acquia_spi.module
$ hg status
M profiles/acquia/acquia.profile
A modules/acquia/acquia_connector/README.txt
A modules/acquia/acquia_connector/acquia_agent/acquia.ico
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.info
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.install
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.module
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.pages.inc
A modules/acquia/acquia_connector/acquia_agent/acquia_agent_drupal_version.inc
A modules/acquia/acquia_connector/acquia_agent/acquia_agent_streams.inc
A modules/acquia/acquia_connector/acquia_spi/acquia_spi.info
A modules/acquia/acquia_connector/acquia_spi/acquia_spi.install
A modules/acquia/acquia_connector/acquia_spi/acquia_spi.module

Excellent; the new files have been added. After this, I just need to accommodate the deleted files that no longer need to be tracked (created when using the “--recursive-unlink” option on tar). For that, see my newer instructions.

Now that the right files are being tracked, I need to commit the changes — modified, added, and deleted files — to the repository. This will create a new revision in the repository’s history, which I’ll tag with the text “Acquia Drupal 1.2.0.”

$ hg commit -m "Acquia Drupal 1.2.0 imported."
$ hg tag "Acquia Drupal 1.2.0"
$ hg tip
changeset:   10:423c84439928
tag:         tip
user:        Jeremy Reichman <jaharmi@jaharmi.com>
date:        Wed Jan 14 21:07:23 2009 -0600
summary:     Added tag Acquia Drupal 1.2.0 for changeset 0df92d3d243d

Once this revision is checked in, I can use it to propagate changes to other repositories. I keep the main Acquia Drupal distribution in its own repository, and then use the “hg fetch” command to pull its changes into one where I track contributed modules. That second repository is then pulled into a third repository which stores just the changes for my production Web site. The use of three repositories in this way modularizes and isolates the updates.

Getting the settings right for the Drupal GeSHi Filter module

I wanted to find a way to do syntax highlighting of code snippets on my Drupal blog. I came across the GeSHi Filter module, which lets Drupal sites take advantage of the apparently well-regarded GeSHi Generic Syntax Highlighter library that’s meant for just this purpose.

However, I ran into some roadblocks implementing it on my site. Here’s the short story of what I settled on after some trial and error.

My existing code snippets are in <code> blocks, and the initial GeSHi Filter settings applied badly to them. I made the decision to only use GeSHi on <blockcode> blocks, since I wasn’t using that tag yet and it wouldn’t conflict with the snippets already posted.

I most commonly write Bash/Zsh, Python, and AppleScript snippets on my blog. However, the Bash code I was using as part of my trial and error simply wasn’t highlighting; it was coming through as the default (and boring) plain text — but was at least boxed off from the rest of the blog post.

I thought that GeSHi wasn't correctly discovering that the code was written in UNIX shell syntax. I couldn’t find a way to specify the language for that blockcode tag, until I did some searching on the ’net. To change my blockquotes to choose a certain language — at least for the purposes of this Drupal module, if not for GeSHI in general — I needed to add the “lang=lang” style to the tag. For Bash, I could use “lang=bash,” for Python, “lang=python,” and for AppleScript, “lang=applescript.” That made sense.

However, my code was still not being syntax highlighted. I discovered that the Drupal module came with an initial set of languages enabled. The others were all turned off, but that could be changed in the module settings. Without turning them on, even properly-tagged <blockcode> sections did not get the benefit of syntax highlighting.

I changed the GeSHi Filter options to enable some of the languages that were initially disabled, and then disabled the ones I didn’t anticipate using. This allowed me to add Bash and AppleScript syntax highlighting support, as both had been turned off by default. After that, I saw the results I’d hoped for: a syntax-highlighted code snippet.

It took some work, but now that it’s done, I should be all set.

List changed files in a Mercurial repository with a custom output style

While trying to troubleshoot what I’d done to mess up the Mercurial repositories managing my Drupal installations last weekend, I really would have liked a way to see what files had changes in specific revisions. Each revision to a Mercurial repository affects some files, of course, but it seems awfully hard to figure what files changed in that check-in.

I have since found a way to do that by customizing the output of Mercurial. To customize output, you can create templates on the command line (with --template) or for more powerful reformatting, create an output style file.

I struggled for a while to figure out how to use style files, and eventually came up with something that works for me so far.

Since I’ve installed Mercurial from Lee Cantey’s standard binary package for Mac OS X Leopard, I created the file “map-cmdline.changedfiles” at the “/Library/Python/2.5/site-packages/mercurial/templates” path. (Where you put the file may vary depending on where Mercurial is installed, and I’m sorry but I don’t know where it gets installed on other systems.) The contents of “map-cmdline.changedfiles” are below, along with my possibly inept description of what each line is doing:

# Get all of the files in the selected revision
# and stringify them, whatever that means
# but do not 'tabindent' or wrap them to 68/76 columns
# Without first setting changeset to the list of files
# you won't get output from subsequent lines
changeset = '{files|stringify}'
# List modified files, one per line
# preceded by M to mimic `hg status`
file = 'M {file}\n'
last_file = 'M {file}\n'
# List added files, one per line
# preceded by A to mimic `hg status`
file_add = 'A {file_add}\n'
last_file_add = 'A {file_add}\n'
# List deleted files, one per line
# preceded by ! to mimic `hg status`
file_del = '! {file_del}\n'
last_file_del = '! {file_del}\n'

I don’t know why the “map-cmdline.” portion of the filename is there, but as long as I have it, I can call the style file from the command line with what follows the period. So, I can call the style with “--style changedfiles” — and that tiny bit of voodoo seems reasonable enough to me. (The other styles in the directory above, many of which end in “.tmpl” extensions, seem related to the Mercurial Web server, hgweb. I tried, but I couldn’t use their names at the command line, with or without their extensions. Plus, their contents looked HTML-ish.)

With the “map-cmdline.changedfiles” style file saved in that location, I can call Mercurial’s “log” command:

$ hg log --style changedfiles -r tip

… which gives me a list of the files changed in the “tip” (or latest revision) of the repository. I could substitute in any revision identifier for “tip.”

I haven’t actually seen the “file_add” and “file_del” keywords in action; every time I’ve used this style file in the manner described, I’ve only seen files marked as “M” — even if I’m looking at a revision where new files were first checked into the repo. I’m confused by that, but I’m not going to let it sour my day at this point.

There might have been an easier way to do this but I didn’t find one last weekend. It took me some time to figure even this bit out, and I hope writing this post saves someone new to Mercurial from future frustration.

Drupal Administration Menu module streamlines admin tasks

I just installed the Drupal Administration Menu contributed module, and I really like it. When you are logged in and have administrative privileges, it provides a small menu bar with drop-down menus and submenus at the top of every page in your site. It provides quick access to every admin function I’ve needed, and really streamlines the process of making changes to the site.

Here’s a screen shot of what it looks like:

Drupal Administration Menu module screen shot

In that respect, I rank it up there with the Taxonomy Manager module, which also provides a major productivity boost.

Update: Apparently I’m not alone; Administration Menu is among the modules included with the Acquia Drupal distribution. I think that lends quite a bit of credibility to this module … if my own recommendation didn’t provide a tipping point for you.

Drupal 5.10 and MarsEdit

Now that I’ve upgraded to Drupal 5.10, I’ve discovered I can no longer post from MarsEdit. I get this error upon submitting a post via XML-RPC, via Drupal’s core Blog API module:

You either tried to edit somebody else’s blog or you don’t have permission to edit your own blog.

Annoying things like this happen all too often with Drupal. I’ve had enough problems with the core BlogAPI and (practically required) Pathauto modules that I should really consider whether the Drupal platform is worth it. It feels like updates for modules are never really tested before they are put out, and so problems like this slip through.

I haven’t figured out what the problem is yet, but hopefully I’ll get this fixed soon. I hate posting through Web interfaces if I can avoid it.

And, just to be clear, all I did was update to the current Drupal 5.10 via my normal routine. I kept my “settings.php” file and my “files” and “sites/all/modules” directories, but everything else is from Drupal 5.10. I ran the “update.php” script after the upgrade. No settings were changed in MarsEdit, which worked before the upgrade.

Update 9/20: I looked in the “Access control” page of my sites’ Drupal Administration section. The “administer content with blog api” had been turned off for my user role. I re-enabled it tentatively, and now posts from MarsEdit appear to work again. I’m not sure why the “administer content with blog api” permission would have been disabled in the Drupal 5.10 update, but that’s what appeared to happen to me.

Is that site running Drupal, curl edition

The Lullabot Web site has a clever way to help answer the question of “Is that site running Drupal?” Angie Byron mentions that the HTTP “Expires” header returned by Drupal corresponds to a specific default date. Look for that date in the HTTP headers, and you can make a reasonable guess that a site is a Drupal site — or at least one that hasn’t modified one of the core files.

Some commenters posted notes about how to do the same thing with wget and then curl. I expanded on the curl instructions to make them a little more robust (especially in the case of redirects or URL rewriting), and here’s the result:

$ curl -fsIL http://jaharmi.com/ 2>&1 | grep -q -m 1 "Expires: Sun, 19 Nov 1978 05:00:00 GMT" && echo "Yes, this appears to be a Drupal site." || echo "No, this does not appear to be a Drupal site."
Yes, this appears to be a Drupal site.

Google Analytics plunge due to module change?

It looks like some recent change to the Google Analytics module for Drupal 5 broke my link to the Analytics service. Well, I’m being overbroad there. It really was working, but apparently only tracking when my administration account and users from a certain other user role (think: group) were logged in.

Well, obviously, that could lead to a significant traffic drop — if for no other reason than anonymous users weren’t being tracked — and that’s what I was seeing throughout my Analytics data. I only check Analytics every once in a while, and it’s currently only a curiosity to me, but it was still confusing as to why I saw all the graphs plunge starting March 7.

My best guess is that one of my recent module update binges loaded a new Google Analytics module which changed something, and only two user roles were set up to be tracked. (The Drupal 5 update process leaves something to be desired. Sure, I should have read the "read me" to find out about the change beforehand, but when you have so many modules to update, you just want to get it done. Wish it had warned me that something needed my attention.) Now that I’ve changed that setting and other user roles — including anonymous — will be tracked again, we’ll see what happens.

Syndicate content