Web

A case for Web hosts to offer search as a service

Apache Solr provides a Web service front end for the Apache Lucene indexing and search engine library. Both Solr and Lucene (upon which Solr depends) are Java-based, which has implications for shared Web hosting.

Drupal is an open source CMS, and I happen to use it on a shared Web hosting provider as of this writing. Drupal is gaining support for Apache Solr through a module that has had a lot of input from Acquia (the “Red Hat” of Drupal).

Dries Buytaert of Acquia has some interesting perspective on search for the Web and CMSes in some recent articles on his site. Specifically, he talks about Acquia Search, a Solr-based search service that is being offered to Drupal sites on the Acquia Network. He discusses the advantages afforded by good search capabilities for both visitors to a Drupal Web site and for site administrators.

I’ve used Acquia Search (in beta), and it has been great. It’s very fast compared to the core Drupal Search module. The ability to perform faceted searches, word stemming, spell checking, and more is all tremendous. (You can see it in action in the search field in the site sidebar, as long as my Acquia Network subscription from the beta lasts.)

But Acquia Search part of a larger service offering — the Acquia Network — which ultimately makes it too expensive for me on my personal sites. It’s priced out of reach for me — more costly for one year than two years of Web hosting, domain registrations, and separate e-mail hosting for my domains are today. I think it’s clear that Acquia is aiming at a different market, and that’s fine.

My idle thought, however, is that search by itself is a compelling feature even for small Web sites like mine. It’s as compelling as hosting files, like HTML or PHP or images, or serving databases, like MySQL and PostgreSQL.

If it’s important as Dries notes in his posts — that the search market is so large and growing, and of such universal importance — then great search is a compelling feature to have for many levels of sites. After serving the files and serving the database, it may be the next big service that a Web hosting provider could offer. And today, Web hosting offers a range of pricing (and service levels) to meet various needs.

I could see advertising that for some monthly fee, a Web host offers 55 GB of storage and 550 GB of monthly data transfer and unlimited MySQL databases — and oh, by the way, some reasonable level of indexing/search with Apache Solr and/or Sphinx or whatever. Although I hate to suggest it, search could even be an optional add-on, as many providers treat dedicated IP addresses or SSL or the like.

There may be an additional win, in that separate servers could be optimized for search to offload that processing from the Web server. It could even be something that a Web host contracts out or partners with another to provide — maybe even with a company like Acquia that’s already set up their infrastructure to scale on Amazon EC2.

Especially if other CMSes, such as WordPress, get Solr integration — as with this WordPress Solr plugin — then the case for Web hosts offering something like Solr search gets convincing.

OpenID delegation and Drupal accounts

I discovered — after I’d set up OpenID delegation (using the Drupal OpenID URL module and Sam Ruby’s instructions) — that each OpenID used with a Drupal site needs to be associated with a Drupal account.

Therefore, even though OpenID delegation may point to a previously-associated provider, such as Verisign Labs’ Personal Identity Portal, or PIP, it acts as its own identity. The delegating URL is a URL in its own right, so this makes some sense even if it is not convenient when you set up delegation after starting to use other OpenIDs.

I had to teach each of my Drupal accounts on various sites that I wanted to use my own URL in addition to any previously-associated OpenIDs.

Considering Frontier DSL

After the news hit about Time Warner Cable’s intent to charge different rates for tiers of monthly data transfer — and an enormous $1/GB fee for overages — it seems eminently sane to consider the competition.

In Rochester, that competition is Frontier DSL. For a long time, that basically meant there was no competition, I’m very sorry to say.

However, the changes to TWC’s fee structure may be so extreme that even that level of competition is good. While I don’t think our household monthly data transfer is excessive, I’m reasonably sure (based on what I’ve seen from the data I’ve collected from our broadband router) that we’ll blow right past the 5 GB/month tier and maybe the 10 GB/month one. We would have to — and by that I mean, I would have to, really — develop some more austere usage of the family Internet connection that we’re accustomed to. Thus, I’m examining the pro and con positions for Frontier’s high-speed Internet service.

Pro

With Frontier DSL, my family should:

  • Not have to deal with the stress of the upcoming 5/10/40/100 GB-per-month tiers from TWC, which will reportedly take effect in Rochester in November 2009
  • Get to send a clear message to TWC that metered Internet access is a terrible idea
  • Get the peace of mind that unlimited, unmetered Internet access provides — but only if Frontier’s existing 5 GB-per-month transfer cap is eliminated
  • Benefit from the refer-a-friend affiliate program — both parties get the $20 referral credit
  • Be able to combine the billing with our Plain Old Telephone Service (POTS).

Con

However, there are some drawbacks to Frontier DSL. My family would be concerned about:

  • A potential monthly DSL modem rental fee
  • A two-year commitment with a $200-300 (I’ve see both figures!) early termination fee
  • The 5 GB-per-month unenforced data transfer cap (but that cap may be dropped entirely in an effort to better compete with TWC)
  • Blocking port 25 — while there are workarounds, this is just aggravating
  • The unknown quantity of Frontier’s technical support, whereas TWC’s has been reasonably good over the years.

Anyway, while we’re mulling this over, the news is playing out on sites like StopTheCap and StopTWC! Meanwhile, I’m more than a little annoyed at the traditional news media avoiding some of the other angles surrounding this topic — the pricing change as a way to protect cable television revenues, the local monopoly (and how cable infrastructure compares to its telephone equivalent), the impact on increasingly Internet-dependent households during a recession, how this might change the habits of people (including employees working at home), and so on.

Linked on installers

John C. Welch’s article, On Installers, is linked from Daring Fireball today. He links to me — thank you very much John, for that and for the kind words about my signal-to-noise ratio (whatever my front page says on that score right now) — placing me one jump away from Daring Fireball.

I was a little worried about that until I checked my Web analytics account. Luckily, my link is third to last and at the end of a long article, or my Web host might be having words with me about traffic.

It was very good to be mentioned, and even be situated in auspicious company between Greg and Nigel. All of us are current or former Radmind admins, and as a group I think Radmind admins tend to know a bit about the foibles of vendors’ installers. Along its famous learning curve, Radmind teaches you a lot about the filesystem and about what’s going into it.

Anyway, for anyone completely new here, you can follow the Mac OS X system administration topic on its own — and skip others, like random Python, Mercurial, Western New York sports, Drupal, and personal chatter.

The dummy domain and the domain pointer at Site5

I’ve been struggling with my Site5 Web hosting account for two years. In many respects, it has been great — good service at a price I was willing to pay. However, my biggest single aggravation has been that there is a primary domain name associated with the hosting account, and that primary domain could not be hosted in a subdirectory of the account cleanly. URL rewriting in an .htaccess file had been my workaround for a long time, but it never really did everything I wanted — and it was complicated.

The hosting account was up for renewal today. I had a decision to make: keep the account and the hassle, keep the account and find a solution to the hassle, or back up my data and move on.

I’m happy to say that I’m keeping the account and it appears that all of the frustration regarding the subdirectory has been eliminated.

The CEO of Site5 helped me out, after seeing my complaints on Twitter. He suggested that I request changing the primary domain to a dummy, nonexistent domain name. With that done, I could then create a domain pointer for my former primary domain (this one, actually), linking it to a subfolder of my account’s public_html directory.

I made the support request, which was fulfilled promptly. The change was made and it works.

Now, it looks like some issues I’d been having have cleared up. Namely, the Global Redirect module I use in Drupal correctly redirects from URLs like /node/300 to their human-friendly paths.

Check a new version of Acquia Drupal into a Mercurial repository

Here is a sequence of commands and output that show how I keep the Acquia Drupal open source content management system up to date with Mercurial, the open source distributed version control system.

In the example below, my Mercurial repositories for Drupal are located in the “drupal” subdirectory of my “repo” folder. Once I’ve moved into that directory, I download the Acquia Drupal distribution with curl and then extract it into my previously-created Mercurial working directory, “acquia_drupal,” using tar.

$ cd repo/drupal
$ curl -O http://acquia.com/files/downloads/acquia-drupal-1.2.0.3780.tar.gz
$ tar --strip-path=1 --directory=acquia_drupal --recursive-unlink -zxvf acquia-drupal-1.2.0.3780.tar.gz

(Update: I added the --recursive-unlink option after I noticed that the Acquia Network control panel keeps track of extra — possibly unneeded — files and folders you have in your install. The recursive unlink option seems to avoid having stray files from old versions of modules hanging around in your repository after you install updates.)

After extracting Acquia Drupal my Mercurial working directory, I get the status of the repository. It shows there are changes from the last version I checked in — and this includes new files, denoted by a “?” at the beginning of their line.

$ cd acquia_drupal
$ hg status
M profiles/acquia/acquia.profile
? modules/acquia/acquia_connector/README.txt
? modules/acquia/acquia_connector/acquia_agent/acquia.ico
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.info
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.install
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.module
? modules/acquia/acquia_connector/acquia_agent/acquia_agent.pages.inc
? modules/acquia/acquia_connector/acquia_agent/acquia_agent_drupal_version.inc
? modules/acquia/acquia_connector/acquia_agent/acquia_agent_streams.inc
? modules/acquia/acquia_connector/acquia_spi/acquia_spi.info
? modules/acquia/acquia_connector/acquia_spi/acquia_spi.install
? modules/acquia/acquia_connector/acquia_spi/acquia_spi.module

Since there are new files, I have to add them so they’ll be tracked by the repository. I only need to add in the parent directory for any changed files, and any new files within it will also be added for tracking.

$ hg add modules/acquia/acquia_connector
adding modules/acquia/acquia_connector/README.txt
adding modules/acquia/acquia_connector/acquia_agent/acquia.ico
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.info
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.install
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.module
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent.pages.inc
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent_drupal_version.inc
adding modules/acquia/acquia_connector/acquia_agent/acquia_agent_streams.inc
adding modules/acquia/acquia_connector/acquia_spi/acquia_spi.info
adding modules/acquia/acquia_connector/acquia_spi/acquia_spi.install
adding modules/acquia/acquia_connector/acquia_spi/acquia_spi.module
$ hg status
M profiles/acquia/acquia.profile
A modules/acquia/acquia_connector/README.txt
A modules/acquia/acquia_connector/acquia_agent/acquia.ico
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.info
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.install
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.module
A modules/acquia/acquia_connector/acquia_agent/acquia_agent.pages.inc
A modules/acquia/acquia_connector/acquia_agent/acquia_agent_drupal_version.inc
A modules/acquia/acquia_connector/acquia_agent/acquia_agent_streams.inc
A modules/acquia/acquia_connector/acquia_spi/acquia_spi.info
A modules/acquia/acquia_connector/acquia_spi/acquia_spi.install
A modules/acquia/acquia_connector/acquia_spi/acquia_spi.module

Excellent; the new files have been added. After this, I just need to accommodate the deleted files that no longer need to be tracked (created when using the “--recursive-unlink” option on tar). For that, see my newer instructions.

Now that the right files are being tracked, I need to commit the changes — modified, added, and deleted files — to the repository. This will create a new revision in the repository’s history, which I’ll tag with the text “Acquia Drupal 1.2.0.”

$ hg commit -m "Acquia Drupal 1.2.0 imported."
$ hg tag "Acquia Drupal 1.2.0"
$ hg tip
changeset:   10:423c84439928
tag:         tip
user:        Jeremy Reichman <jaharmi@jaharmi.com>
date:        Wed Jan 14 21:07:23 2009 -0600
summary:     Added tag Acquia Drupal 1.2.0 for changeset 0df92d3d243d

Once this revision is checked in, I can use it to propagate changes to other repositories. I keep the main Acquia Drupal distribution in its own repository, and then use the “hg fetch” command to pull its changes into one where I track contributed modules. That second repository is then pulled into a third repository which stores just the changes for my production Web site. The use of three repositories in this way modularizes and isolates the updates.

Getting the settings right for the Drupal GeSHi Filter module

I wanted to find a way to do syntax highlighting of code snippets on my Drupal blog. I came across the GeSHi Filter module, which lets Drupal sites take advantage of the apparently well-regarded GeSHi Generic Syntax Highlighter library that’s meant for just this purpose.

However, I ran into some roadblocks implementing it on my site. Here’s the short story of what I settled on after some trial and error.

My existing code snippets are in <code> blocks, and the initial GeSHi Filter settings applied badly to them. I made the decision to only use GeSHi on <blockcode> blocks, since I wasn’t using that tag yet and it wouldn’t conflict with the snippets already posted.

I most commonly write Bash/Zsh, Python, and AppleScript snippets on my blog. However, the Bash code I was using as part of my trial and error simply wasn’t highlighting; it was coming through as the default (and boring) plain text — but was at least boxed off from the rest of the blog post.

I thought that GeSHi wasn't correctly discovering that the code was written in UNIX shell syntax. I couldn’t find a way to specify the language for that blockcode tag, until I did some searching on the ’net. To change my blockquotes to choose a certain language — at least for the purposes of this Drupal module, if not for GeSHI in general — I needed to add the “lang=lang” style to the tag. For Bash, I could use “lang=bash,” for Python, “lang=python,” and for AppleScript, “lang=applescript.” That made sense.

However, my code was still not being syntax highlighted. I discovered that the Drupal module came with an initial set of languages enabled. The others were all turned off, but that could be changed in the module settings. Without turning them on, even properly-tagged <blockcode> sections did not get the benefit of syntax highlighting.

I changed the GeSHi Filter options to enable some of the languages that were initially disabled, and then disabled the ones I didn’t anticipate using. This allowed me to add Bash and AppleScript syntax highlighting support, as both had been turned off by default. After that, I saw the results I’d hoped for: a syntax-highlighted code snippet.

It took some work, but now that it’s done, I should be all set.

List changed files in a Mercurial repository with a custom output style

While trying to troubleshoot what I’d done to mess up the Mercurial repositories managing my Drupal installations last weekend, I really would have liked a way to see what files had changes in specific revisions. Each revision to a Mercurial repository affects some files, of course, but it seems awfully hard to figure what files changed in that check-in.

I have since found a way to do that by customizing the output of Mercurial. To customize output, you can create templates on the command line (with --template) or for more powerful reformatting, create an output style file.

I struggled for a while to figure out how to use style files, and eventually came up with something that works for me so far.

Since I’ve installed Mercurial from Lee Cantey’s standard binary package for Mac OS X Leopard, I created the file “map-cmdline.changedfiles” at the “/Library/Python/2.5/site-packages/mercurial/templates” path. (Where you put the file may vary depending on where Mercurial is installed, and I’m sorry but I don’t know where it gets installed on other systems.) The contents of “map-cmdline.changedfiles” are below, along with my possibly inept description of what each line is doing:

# Get all of the files in the selected revision
# and stringify them, whatever that means
# but do not 'tabindent' or wrap them to 68/76 columns
# Without first setting changeset to the list of files
# you won't get output from subsequent lines
changeset = '{files|stringify}'
# List modified files, one per line
# preceded by M to mimic `hg status`
file = 'M {file}\n'
last_file = 'M {file}\n'
# List added files, one per line
# preceded by A to mimic `hg status`
file_add = 'A {file_add}\n'
last_file_add = 'A {file_add}\n'
# List deleted files, one per line
# preceded by ! to mimic `hg status`
file_del = '! {file_del}\n'
last_file_del = '! {file_del}\n'

I don’t know why the “map-cmdline.” portion of the filename is there, but as long as I have it, I can call the style file from the command line with what follows the period. So, I can call the style with “--style changedfiles” — and that tiny bit of voodoo seems reasonable enough to me. (The other styles in the directory above, many of which end in “.tmpl” extensions, seem related to the Mercurial Web server, hgweb. I tried, but I couldn’t use their names at the command line, with or without their extensions. Plus, their contents looked HTML-ish.)

With the “map-cmdline.changedfiles” style file saved in that location, I can call Mercurial’s “log” command:

$ hg log --style changedfiles -r tip

… which gives me a list of the files changed in the “tip” (or latest revision) of the repository. I could substitute in any revision identifier for “tip.”

I haven’t actually seen the “file_add” and “file_del” keywords in action; every time I’ve used this style file in the manner described, I’ve only seen files marked as “M” — even if I’m looking at a revision where new files were first checked into the repo. I’m confused by that, but I’m not going to let it sour my day at this point.

There might have been an easier way to do this but I didn’t find one last weekend. It took me some time to figure even this bit out, and I hope writing this post saves someone new to Mercurial from future frustration.

Untar archive contents directly into a target folder

In my Mercurial-based workflow for updating Drupal sites, there is a sequence of commands I need whenever a new version of Drupal comes out. I have a hard time remembering the options for “tar” in this sequence — and my original source for the instructions differs from what I need to do on my Web host — so I need to help my memory. The tar command, as constructed below, places its output into the specified destination directory.

Here it is, with tar’s “--strip-path=1” and “-C” options:

$ cd path/to/repository/parent/directory
$ curl -O http://ftp.drupal.org/files/projects/drupal-5.12.tar.gz
$ tar --strip-path=1 -C drupal_source -zxv -f drupal-5.12.tar.gz

Update your Fastmail.fm address book from Apple Address Book

I am continually coming across useful features in my Fastmail.fm account that I have previously overlooked. For example, I now know that I can import contacts into the account’s address book. As with many mail systems, this provides several benefits:

  • You can use your contacts to address messages (duh).
  • Your contacts’ e-mail addresses serve as a whitelist for junk mail filtering.

Since I enabled “aggressive” filtering on my Fastmail account, having that whitelist functionality is of interest to me.

The Fastmail developers have helpfully provided a way to upload multiple kinds of contact data, so I chose vCards. They were easy for me to obtain from Apple Address Book. Here’s the basic process I followed:

  1. Export existing contact cards to vCard format from Apple Address Book. You can save the vCards somewhere and delete them later; one trick I use is to save temporary data like this to /tmp so it’ll be deleted by the next time I reboot. Address Book will export multiple selected contacts to one vCard file, and Fastmail accepts this.
  2. Go to Options > Upload Addresses on the Options/Settings page in your Fastmail.fm account. (Half the fun with Fastmail is wading through their exceedingly busy user interface.)
  3. Click on the “Choose file” button next to the “Address book file” label.
  4. Select your exported vCard file in the open dialog and click “OK” to upload it.

Fastmail has done a reasonable job of matching imported vCards to existing contacts, as long as they were previously imported from a vCard. I had some duplicates for those people who I'd added manually through the Fastmail interface before importing their vCards. Fastmail does state they try to match up contacts to reduce duplicates during the import process, but I still needed to do a little cleanup — and, unfortunately, there is no “merge contact” feature I came across.

For what it’s worth, the import process is fast, but I probably brought in under 100 contacts total, so I wasn’t necessarily taxing it.

Not all of the vCard data seems to come through and be displayed in the Fastmail address book. That's okay for me; I have it elsewhere. In particular, I didn't see Web addresses beyond the first one show up. Other contact information, like e-mail addresses and phone numbers, seemed to display fine. (I haven’t done an export from FM to vCards to see about roundtrip fidelity yet.)

Once you have cards in your address book, you can update them later. If you’re going to bother importing them in the first place, you’re probably concerned about keeping them up to date. Since the Fastmail software does a decent job of avoiding duplicates (in my brief experience), this shouldn’t be hard to do.

The most difficult part is sorting out which cards have been changed over time. You could re-import all of your cards, certainly. Or, you could use Apple Address Book to help find the recent changes to your contacts — whether those changes were done in Address Book or synchronized via Sync Services (perhaps from Microsoft Entourage).

Here’s how:

  1. Create a Smart Group in Apple Address Book to show cards that have been updated in the last seven days. Pick a period of time that works for you; seven days works for me for the moment.
  2. Select the Smart Group.
  3. Select all of the cards which appear in the Smart Group.
  4. Export the selected cards to a vCard file.
  5. Import that vCard into Fastmail using the steps above. Your changes are now in your online address book.

While this is far more manual effort than I’d like, it’s not terrible. It’s something that I can envision doing every few months. I do most of my e-mail in a desktop application rather than on Fastmail’s Web site, so if my online address book there is a little out of date, it’s not a huge concern.

Syndicate content