Apache Solr provides a Web service front end for the Apache Lucene indexing and search engine library. Both Solr and Lucene (upon which Solr depends) are Java-based, which has implications for shared Web hosting.
Drupal is an open source CMS, and I happen to use it on a shared Web hosting provider as of this writing. Drupal is gaining support for Apache Solr through a module that has had a lot of input from Acquia (the “Red Hat” of Drupal).
Dries Buytaert of Acquia has some interesting perspective on search for the Web and CMSes in some recent articles on his site. Specifically, he talks about Acquia Search, a Solr-based search service that is being offered to Drupal sites on the Acquia Network. He discusses the advantages afforded by good search capabilities for both visitors to a Drupal Web site and for site administrators.
I’ve used Acquia Search (in beta), and it has been great. It’s very fast compared to the core Drupal Search module. The ability to perform faceted searches, word stemming, spell checking, and more is all tremendous. (You can see it in action in the search field in the site sidebar, as long as my Acquia Network subscription from the beta lasts.)
But Acquia Search part of a larger service offering — the Acquia Network — which ultimately makes it too expensive for me on my personal sites. It’s priced out of reach for me — more costly for one year than two years of Web hosting, domain registrations, and separate e-mail hosting for my domains are today. I think it’s clear that Acquia is aiming at a different market, and that’s fine.
My idle thought, however, is that search by itself is a compelling feature even for small Web sites like mine. It’s as compelling as hosting files, like HTML or PHP or images, or serving databases, like MySQL and PostgreSQL.
If it’s important as Dries notes in his posts — that the search market is so large and growing, and of such universal importance — then great search is a compelling feature to have for many levels of sites. After serving the files and serving the database, it may be the next big service that a Web hosting provider could offer. And today, Web hosting offers a range of pricing (and service levels) to meet various needs.
I could see advertising that for some monthly fee, a Web host offers 55 GB of storage and 550 GB of monthly data transfer and unlimited MySQL databases — and oh, by the way, some reasonable level of indexing/search with Apache Solr and/or Sphinx or whatever. Although I hate to suggest it, search could even be an optional add-on, as many providers treat dedicated IP addresses or SSL or the like.
There may be an additional win, in that separate servers could be optimized for search to offload that processing from the Web server. It could even be something that a Web host contracts out or partners with another to provide — maybe even with a company like Acquia that’s already set up their infrastructure to scale on Amazon EC2.
Especially if other CMSes, such as WordPress, get Solr integration — as with this WordPress Solr plugin — then the case for Web hosts offering something like Solr search gets convincing.
I discovered — after I’d set up OpenID delegation (using the Drupal OpenID URL module and Sam Ruby’s instructions) — that each OpenID used with a Drupal site needs to be associated with a Drupal account.
Therefore, even though OpenID delegation may point to a previously-associated provider, such as Verisign Labs’ Personal Identity Portal, or PIP, it acts as its own identity. The delegating URL is a URL in its own right, so this makes some sense even if it is not convenient when you set up delegation after starting to use other OpenIDs.
I had to teach each of my Drupal accounts on various sites that I wanted to use my own URL in addition to any previously-associated OpenIDs.
After the news hit about Time Warner Cable’s intent to charge different rates for tiers of monthly data transfer — and an enormous $1/GB fee for overages — it seems eminently sane to consider the competition.
In Rochester, that competition is Frontier DSL. For a long time, that basically meant there was no competition, I’m very sorry to say.
However, the changes to TWC’s fee structure may be so extreme that even that level of competition is good. While I don’t think our household monthly data transfer is excessive, I’m reasonably sure (based on what I’ve seen from the data I’ve collected from our broadband router) that we’ll blow right past the 5 GB/month tier and maybe the 10 GB/month one. We would have to — and by that I mean, I would have to, really — develop some more austere usage of the family Internet connection that we’re accustomed to. Thus, I’m examining the pro and con positions for Frontier’s high-speed Internet service.
Pro
With Frontier DSL, my family should:
Con
However, there are some drawbacks to Frontier DSL. My family would be concerned about:
Anyway, while we’re mulling this over, the news is playing out on sites like StopTheCap and StopTWC! Meanwhile, I’m more than a little annoyed at the traditional news media avoiding some of the other angles surrounding this topic — the pricing change as a way to protect cable television revenues, the local monopoly (and how cable infrastructure compares to its telephone equivalent), the impact on increasingly Internet-dependent households during a recession, how this might change the habits of people (including employees working at home), and so on.
John C. Welch’s article, On Installers, is linked from Daring Fireball today. He links to me — thank you very much John, for that and for the kind words about my signal-to-noise ratio (whatever my front page says on that score right now) — placing me one jump away from Daring Fireball.
I was a little worried about that until I checked my Web analytics account. Luckily, my link is third to last and at the end of a long article, or my Web host might be having words with me about traffic.
It was very good to be mentioned, and even be situated in auspicious company between Greg and Nigel. All of us are current or former Radmind admins, and as a group I think Radmind admins tend to know a bit about the foibles of vendors’ installers. Along its famous learning curve, Radmind teaches you a lot about the filesystem and about what’s going into it.
Anyway, for anyone completely new here, you can follow the Mac OS X system administration topic on its own — and skip others, like random Python, Mercurial, Western New York sports, Drupal, and personal chatter.
I’ve been struggling with my Site5 Web hosting account for two years. In many respects, it has been great — good service at a price I was willing to pay. However, my biggest single aggravation has been that there is a primary domain name associated with the hosting account, and that primary domain could not be hosted in a subdirectory of the account cleanly. URL rewriting in an .htaccess file had been my workaround for a long time, but it never really did everything I wanted — and it was complicated.
The hosting account was up for renewal today. I had a decision to make: keep the account and the hassle, keep the account and find a solution to the hassle, or back up my data and move on.
I’m happy to say that I’m keeping the account and it appears that all of the frustration regarding the subdirectory has been eliminated.
The CEO of Site5 helped me out, after seeing my complaints on Twitter. He suggested that I request changing the primary domain to a dummy, nonexistent domain name. With that done, I could then create a domain pointer for my former primary domain (this one, actually), linking it to a subfolder of my account’s public_html directory.
I made the support request, which was fulfilled promptly. The change was made and it works.
Now, it looks like some issues I’d been having have cleared up. Namely, the Global Redirect module I use in Drupal correctly redirects from URLs like /node/300 to their human-friendly paths.
Here is a sequence of commands and output that show how I keep the Acquia Drupal open source content management system up to date with Mercurial, the open source distributed version control system.
In the example below, my Mercurial repositories for Drupal are located in the “drupal” subdirectory of my “repo” folder. Once I’ve moved into that directory, I download the Acquia Drupal distribution with curl and then extract it into my previously-created Mercurial working directory, “acquia_drupal,” using tar.
(Update: I added the --recursive-unlink option after I noticed that the Acquia Network control panel keeps track of extra — possibly unneeded — files and folders you have in your install. The recursive unlink option seems to avoid having stray files from old versions of modules hanging around in your repository after you install updates.)
After extracting Acquia Drupal my Mercurial working directory, I get the status of the repository. It shows there are changes from the last version I checked in — and this includes new files, denoted by a “?” at the beginning of their line.
Since there are new files, I have to add them so they’ll be tracked by the repository. I only need to add in the parent directory for any changed files, and any new files within it will also be added for tracking.
Excellent; the new files have been added. After this, I just need to accommodate the deleted files that no longer need to be tracked (created when using the “--recursive-unlink” option on tar). For that, see my newer instructions.
Now that the right files are being tracked, I need to commit the changes — modified, added, and deleted files — to the repository. This will create a new revision in the repository’s history, which I’ll tag with the text “Acquia Drupal 1.2.0.”
Once this revision is checked in, I can use it to propagate changes to other repositories. I keep the main Acquia Drupal distribution in its own repository, and then use the “hg fetch” command to pull its changes into one where I track contributed modules. That second repository is then pulled into a third repository which stores just the changes for my production Web site. The use of three repositories in this way modularizes and isolates the updates.
I wanted to find a way to do syntax highlighting of code snippets on my Drupal blog. I came across the GeSHi Filter module, which lets Drupal sites take advantage of the apparently well-regarded GeSHi Generic Syntax Highlighter library that’s meant for just this purpose.
However, I ran into some roadblocks implementing it on my site. Here’s the short story of what I settled on after some trial and error.
My existing code snippets are in <code> blocks, and the initial GeSHi Filter settings applied badly to them. I made the decision to only use GeSHi on <blockcode> blocks, since I wasn’t using that tag yet and it wouldn’t conflict with the snippets already posted.
I most commonly write Bash/Zsh, Python, and AppleScript snippets on my blog. However, the Bash code I was using as part of my trial and error simply wasn’t highlighting; it was coming through as the default (and boring) plain text — but was at least boxed off from the rest of the blog post.
I thought that GeSHi wasn't correctly discovering that the code was written in UNIX shell syntax. I couldn’t find a way to specify the language for that blockcode tag, until I did some searching on the ’net. To change my blockquotes to choose a certain language — at least for the purposes of this Drupal module, if not for GeSHI in general — I needed to add the “lang=lang” style to the tag. For Bash, I could use “lang=bash,” for Python, “lang=python,” and for AppleScript, “lang=applescript.” That made sense.
However, my code was still not being syntax highlighted. I discovered that the Drupal module came with an initial set of languages enabled. The others were all turned off, but that could be changed in the module settings. Without turning them on, even properly-tagged <blockcode> sections did not get the benefit of syntax highlighting.
I changed the GeSHi Filter options to enable some of the languages that were initially disabled, and then disabled the ones I didn’t anticipate using. This allowed me to add Bash and AppleScript syntax highlighting support, as both had been turned off by default. After that, I saw the results I’d hoped for: a syntax-highlighted code snippet.
It took some work, but now that it’s done, I should be all set.
While trying to troubleshoot what I’d done to mess up the Mercurial repositories managing my Drupal installations last weekend, I really would have liked a way to see what files had changes in specific revisions. Each revision to a Mercurial repository affects some files, of course, but it seems awfully hard to figure what files changed in that check-in.
I have since found a way to do that by customizing the output of Mercurial. To customize output, you can create templates on the command line (with --template) or for more powerful reformatting, create an output style file.
I struggled for a while to figure out how to use style files, and eventually came up with something that works for me so far.
Since I’ve installed Mercurial from Lee Cantey’s standard binary package for Mac OS X Leopard, I created the file “map-cmdline.changedfiles” at the “/Library/Python/2.5/site-packages/mercurial/templates” path. (Where you put the file may vary depending on where Mercurial is installed, and I’m sorry but I don’t know where it gets installed on other systems.) The contents of “map-cmdline.changedfiles” are below, along with my possibly inept description of what each line is doing:
# Get all of the files in the selected revision
# and stringify them, whatever that means
# but do not 'tabindent' or wrap them to 68/76 columns
# Without first setting changeset to the list of files
# you won't get output from subsequent lines
changeset = '{files|stringify}'
# List modified files, one per line
# preceded by M to mimic `hg status`
file = 'M {file}\n'
last_file = 'M {file}\n'
# List added files, one per line
# preceded by A to mimic `hg status`
file_add = 'A {file_add}\n'
last_file_add = 'A {file_add}\n'
# List deleted files, one per line
# preceded by ! to mimic `hg status`
file_del = '! {file_del}\n'
last_file_del = '! {file_del}\n'
I don’t know why the “map-cmdline.” portion of the filename is there, but as long as I have it, I can call the style file from the command line with what follows the period. So, I can call the style with “--style changedfiles” — and that tiny bit of voodoo seems reasonable enough to me. (The other styles in the directory above, many of which end in “.tmpl” extensions, seem related to the Mercurial Web server, hgweb. I tried, but I couldn’t use their names at the command line, with or without their extensions. Plus, their contents looked HTML-ish.)
With the “map-cmdline.changedfiles” style file saved in that location, I can call Mercurial’s “log” command:
$ hg log --style changedfiles -r tip
… which gives me a list of the files changed in the “tip” (or latest revision) of the repository. I could substitute in any revision identifier for “tip.”
I haven’t actually seen the “file_add” and “file_del” keywords in action; every time I’ve used this style file in the manner described, I’ve only seen files marked as “M” — even if I’m looking at a revision where new files were first checked into the repo. I’m confused by that, but I’m not going to let it sour my day at this point.
There might have been an easier way to do this but I didn’t find one last weekend. It took me some time to figure even this bit out, and I hope writing this post saves someone new to Mercurial from future frustration.
In my Mercurial-based workflow for updating Drupal sites, there is a sequence of commands I need whenever a new version of Drupal comes out. I have a hard time remembering the options for “tar” in this sequence — and my original source for the instructions differs from what I need to do on my Web host — so I need to help my memory. The tar command, as constructed below, places its output into the specified destination directory.
Here it is, with tar’s “--strip-path=1” and “-C” options:
$ cd path/to/repository/parent/directory
$ curl -O http://ftp.drupal.org/files/projects/drupal-5.12.tar.gz
$ tar --strip-path=1 -C drupal_source -zxv -f drupal-5.12.tar.gz
I am continually coming across useful features in my Fastmail.fm account that I have previously overlooked. For example, I now know that I can import contacts into the account’s address book. As with many mail systems, this provides several benefits:
Since I enabled “aggressive” filtering on my Fastmail account, having that whitelist functionality is of interest to me.
The Fastmail developers have helpfully provided a way to upload multiple kinds of contact data, so I chose vCards. They were easy for me to obtain from Apple Address Book. Here’s the basic process I followed:
Fastmail has done a reasonable job of matching imported vCards to existing contacts, as long as they were previously imported from a vCard. I had some duplicates for those people who I'd added manually through the Fastmail interface before importing their vCards. Fastmail does state they try to match up contacts to reduce duplicates during the import process, but I still needed to do a little cleanup — and, unfortunately, there is no “merge contact” feature I came across.
For what it’s worth, the import process is fast, but I probably brought in under 100 contacts total, so I wasn’t necessarily taxing it.
Not all of the vCard data seems to come through and be displayed in the Fastmail address book. That's okay for me; I have it elsewhere. In particular, I didn't see Web addresses beyond the first one show up. Other contact information, like e-mail addresses and phone numbers, seemed to display fine. (I haven’t done an export from FM to vCards to see about roundtrip fidelity yet.)
Once you have cards in your address book, you can update them later. If you’re going to bother importing them in the first place, you’re probably concerned about keeping them up to date. Since the Fastmail software does a decent job of avoiding duplicates (in my brief experience), this shouldn’t be hard to do.
The most difficult part is sorting out which cards have been changed over time. You could re-import all of your cards, certainly. Or, you could use Apple Address Book to help find the recent changes to your contacts — whether those changes were done in Address Book or synchronized via Sync Services (perhaps from Microsoft Entourage).
Here’s how:
While this is far more manual effort than I’d like, it’s not terrible. It’s something that I can envision doing every few months. I do most of my e-mail in a desktop application rather than on Fastmail’s Web site, so if my online address book there is a little out of date, it’s not a huge concern.