alanwilliamson

Trouble in ROME

I have been using the Java RSS library ROME for a very long time now.  The original design was to make the logistics of reading and writing RSS feeds a blackbox, with the developer not needing to worry about the format but concentrate on the data.  Sadly the reality doesn't quite match the dream.  Do not get me wrong ROME works well for 90% of the feeds, but its those 10% of the feeds that takes a good library and makes it a great library.

Let me give you an example.  I am using ROME as the main parsing engine for an aggregator.  Not using it for anything out of the ordinary.  However, the biggest failing with ROME is its inability to read non-conformant formats.  With the mess that the RSS formats is in, its not unusual to see badly or incorrectly formatted feeds.  Want an example?  You don't need to look any further than James Goslings own feed.  Here he advertises it as an RSS0.91 format, but that specification says the <pubDate> element shouldn't exist.  But yet, James has it in there.  ROME can't read all of James' content.

In the early days of the ROME list, the attitude was "tell the producer they are generating the wrong format".  Well yeah, I take that to a certain degree, but there comes a point where that isn't possible.  ROME should be reading as much of the data as possible and not leaving anything out, irrespective of the format the feed claims to be.  This gives us the ability to be tolerant of poorly formatted feeds.

ROME should however strive to generate legal content adhering to all standards.  This allows ROME to take the moral high ground of basically becoming an RSS cleanser while not dropping any data.  In all fairness, ROME does provide a framework to read dodgy formats, but this isn't enabled out of the box and takes a little bit of coding and fiddling to make it work.  My point is that this shouldn't be the case.  It should be reading as much as possible out of the box.  The very reason I am using ROME is so I don't have to worry about the finer points of each specification.  I don't want to parse RSS feeds, ROME is suppose to be doing that for me.  Let the library handle all of that, not me.

I will be publishing the extra classes you need to extend ROME to handle more liberal formats in a few days.

 

Recent Cloud posts

Recent JAVA posts

Latest CFML posts


 
Site Links