March 06, 2007
A Tale of Two Feeds
It was the best of times, it was the worst of times.
Though Dickens may have been writing about the French Revolution when he opened his much-exalted A Tale of Two Cities, his proverbial words could just as easily apply to a very different revolution: the RSS revolution, and its still nascent descendant, Atom1.
Although the concept of a web feed has existed for a number of years now, I would argue that it has only recently begun to reach the proverbial tipping point
amongst the unwashed, non-technical masses. The arrival of easy to use, no-nonsense aggregators and news readers such as Google Reader and Blog Lines has made what was once an incredibly arduous and arcane task (e.g., debugging malformed XML) into something even my parents would be comfortable using—assuming, of course, that my parents ever decide to venture out of the electronic dark ages (a questionable assumption at best). In what amounts to a few clicks of the mouse, one can now transform one's life from the utter tedium of manually checking individual blogs of friends, family, and the occasional Internet celebrity every couple days to the super-efficient method of reading all of one's content in one, convenient location on the web. These are indeed the best of times when it comes to convenience—the next logical step in the storied history of the Internet and blogging.
As Dickens astutely noted, however, with the best of times, so come the worst of times, at least for the developers who hope to enable this web feed revolution. Though Rohit's Realm has been using some form of syndication since at least 2003, the process of building and maintaining this feature was seriously hindered by vague and/or nonexistent standards and spotty—or, in some cases, erroneous—support from core Perl libraries (i.e., XML::RSS
). Luckily, the RSS libraries have finally matured to the point where I am fairly confident that Rohit's Realm now constantly supports valid RSS feeds. The same cannot be said about the site's purported Atom 1.0 feed that I so breathlessly declared was functional last month.
Until yesterday, the Atom feed on this site was riddled with a number of problems, including:
- The feed was not version 1.0, but the deprecated 0.3 instead
- The feed claimed it was written in the UTF-8 character set, but really was in ISO-8859-1
- The feed utilized an obsolete XML namespace (i.e.,
http://purl.org/atom/ns#
rather thanhttp://www.w3.org/2005/Atom
) - Links were simply broken due to a bug in my code
I spent the better part of yesterday fixing all these problems, only to discover that given the unstable nature of the Atom specification itself, the hopelessly inadequate state of the Perl libraries (i.e., XML::Atom
), and various other idiotic issues, the only way to get a valid Atom feed would be to hack away. Hack away I did, and now, I think the feed is valid, though who really knows?
The bottom line is this: the XML::Atom
library is still not fully supportive of Atom 1.0, and various cool features cannot exist until it is ready (e.g., categories, contributors, etc.). Moreover, for posterity's sake, some of the truly idiotic issues I had to hack around included:
- Converting all the text that is natively stored as ISO-8859-1 to UTF-8;
- Manually injecting a colon in the timezone offset of the timestamp generated by Perl's
DateTime
library to achieve RFC-3339 compliance; and - Manually changing all high-bit characters (e.g., é, è, á) to their corresponding HTML entities (argh!).
Even now, I am not totally sure things will continue to work. The XML::Atom::Entry
library continues to sometimes mark my content as type="xhtml"
and sometimes include entity-escaped HTML. Why? Hell if I know. My only hope is that the core libraries will soon catch up to the specification and I can finally send RSS off to the guillotine, leaving a new and improved Atom feed to live happily ever after with Lucie. It is a far, far better thing that I do, than I have ever done. It is a far, far better rest that I go to, than I have ever known.
1 To be precise, Atom and RSS share an analogous lineage in evolutionary terms, not homologous, but...just shut up, you nitpicking asshole.