Fun With Title Case
Posted by marshall Sun, 25 May 2008 03:54:00 GMT
A few days ago, John Gruber posted a Perl script that he uses for converting text to title case on the excellent Daring Fireball site. Shortly thereafter, Dan Benjamin of RailsMachine voiced my immediate thought: "We need this rewritten in Ruby."
Ruby includes a capitalize method for String objects, but it simply uppercases the first letter and then downcases everything else -- not helpful at all. Rails adds a titleize method that gets a bit closer, but it's one of the "non-clever" functions that Gruber mentions: it doesn't downcase small words like "of", it incorrectly handles words with embedded caps (e.g. "iTunes"), and it mangles contractions and possessives (e.g. "can't", "AT&T's"). Gruber's script, by contrast, works correctly with almost any input.
Knowing that Ruby's text processing features are largely influenced by Perl, and seeing that the script in question wasn't actually that long, I figured I'd give it a try. It turned out to be quite a straightforward port; the biggest hurdle was learning enough Perl to determine what the script was doing (it's one of those languages that I was intending to learn for a long time, but I lost the will once I came across Ruby). And once I'd gotten that far, it wasn't too difficult to work out a JavaScript version as well. Gruber very helpfully provided a list of edge cases for testing, so it was easy to tell when the new code was working properly.
With the Ruby and JavaScript versions finished, I probably should have stopped. But it was turning out to be a fun exercise, and I thought to myself, "what if I needed this in an iPhone app someday?" Well, that's more of a challenge. Gruber's script does all its work with regular expressions, and Cocoa does not have built-in regular expression support. Various people have come up with extensions to support them, such as RegexKit, but those require additional libraries, and I kind of wanted to keep it self-contained. Plus I figured it would be more of an interesting problem to solve with just the built-in objects, since I'd already done two versions that depended on regular expressions.
The result is an Objective-C category (VCTitleCase) that extends all NSString objects with a titlecaseString method to complement the existing lowercaseString and uppercaseString methods. It's implemented using NSScanner, and because it already has to parse out each word, it does almost everything in one pass rather than doing multiple find-and-replace steps.
All three versions are available on the Title Case Ports page.
Footnote: Of course, I was just one of many to respond to Benjamin's request: he later posted a list of 11 responses, with more in the comments. A couple of them took the next step and extended the Ruby String class to support title-casing. Unfortunately, quite a few seem to have missed the point -- or at least they didn't take the time to understand the problem. Many didn't properly handle contractions, small words at the end, or the special cases that Gruber's script took care of, such as "Q&A". It's particularly disappointing given that 1) the original script wasn't that long, 2) Gruber spelled out exactly what "clever" things he was trying to accomplish, and 3) a set of edge cases was provided for testing.
