VPN

Remove filename extensions with os.path.splitext instead of the {lr}strip or replace methods in Python

Jesper Noehr explains why {l,r}strip are considered harmful for removing extensions from filenames with Python. I think he’s absolutely right on that score, and I would agree. The lstrip() and rstrip() methods shouldn’t be used for this purpose.

However, like the only commenter on that post, I’d also recommend os.path.splitext() as the proper tool for the extension-removing job.

Let’s take some example filenames you might come across on Mac OS X Snow Leopard:

>>> a = [ ‘Document.pages’, ‘Property List.plist’, ‘Application.app’, ‘Word.docx’, VPN (Cisco VPN).networkConnect’, ‘Dated Spreadsheet 2010.02.17.xlsx’]

If we had a list of filenames (or file paths) like this — perhaps created by os.walk() or some other generator-based process — we couldn’t easily use Jesper’s recommended solution. The replace() string method would give us a much harder time dealing with the range of filenames and extensions in that list. In the case where you don’t know the filename extensions in advance, replace() breaks down. The replace() method would have to be looped with many possible filename extensions.

What we need is a way to split filename from extension, even if we don’t know the extension beforehand. The os.path.splitext() alternative does just that, returning a tuple. Here, I’ll import the os module and then use a list comprehension to run os.path.splitext() through every filename in the list above.

>>> import os
>>> [os.path.splitext(x) for x in a]
[(‘Document’, ‘.pages’), (‘Property List’, ‘.plist’), (‘Application’, ‘.app’), (‘Word’, ‘.docx’), (VPN (Cisco VPN)’, ‘.networkConnect’), (‘Dated Spreadsheet 2010.02.17’, ‘.xlsx’)]

It becomes a simple matter to get just the filename from the tuple, as I do here by modifying the list comprehension to just get the zeroth item from it:

>>> [os.path.splitext(x)[0] for x in a]
[‘Document’, ‘Property List’, ‘Application’, ‘Word’, VPN (Cisco VPN)’, ‘Dated Spreadsheet 2010.02.17’]

Note that several interesting conditions are handled by os.path.splitext():

  • long filename extensions, including “.pages” and “.networkConnect”
  • spaces in filenames, as in “Property List.plist”
  • parentheses, as in “VPN (Cisco VPN).networkConnect”
  • periods within the filename, as seen in “Dated Spreadsheet 2010.02.17.xlsx”
Syndicate content