2008年7月30日星期三

[fw-webservices] Proposal suggested for a Zend_Feed_Reader

Hi all,

Being the ZF junkie that I am, I have been considering creating a proposal to address a few issues with Zend_Feed. Rather than let loose with the Proposal right away I figured I'd open the floor here (and on #zfdev where, for once :p, I am logged in at the moment), see what the community reception is, and steal...eh...borrow ideas where they make sense.

I'm currently towards the end of a second build of a simplified blog aggregation library based on Zend_Feed. Having done this twice I figured it was time to come up with a final solution so there's no third time ;).

In using Zend_Feed I've come to realise that it is largely an abstract API to PHP DOM. It imports, parses and presents a more natural API (akin to SimpleXML in a sense) for RSS and Atom feeds. However, Zend_Feed itself does not understand RSS or Atom - understanding these formats and their many versions is left entirely to the end programmer with a few exceptions. Zend_Feed_Reader's primary goal would be to take up the task of understanding and interpreting RSS and Atom.

This is a short email (from me, that means it's <3000 words ;)) throwing around some concepts for debate.

Current Potential Issues with Zend_Feed

Attempting to aggregate multiple feeds using different RSS and Atom standards requires a lot of extra work to correctly narrow all the available data down to common preferred points like: date, id, author, title, content, etc. It also means a lot of edge cases (malformed XML, non-standard RSS/Atom, etc.) also may need to be tracked by users. One of the few interpretive measures is being able to access RSS "content:encoded" data using a content() method. I also note "dc" namespaces are likewise resolved without the namespace prefix. Neither is consistent (e.g. content:encoded is accessible using the namespace prefix as the method name).

The second issue is that Zend_Feed is only minimally aware of HTTP. When given a URL it fetches that URL using Zend_Http_Client. Responses are neither cached nor conditionally fetched meaning that once again users must handle these scenarios by themselves or risk wasting precious memory, CPU cycles and bandwidth processing unchanged feeds. Not a biggie since it's easily implemented, but I've seen few Zend_Feed based apps doing it which is really poor practice in the wild.

A third potential problem is the assumption that all feeds will follow identical namespace patterns based on the namespace prefix (an arbitrary string). There are two examples here.

One is that some namespaces are resolved automatically by Zend_Feed while others are left unresolved which presents an inconsistent API. For example, an RSS 2.0 <entry> element might hold <dc:creator> and <slash:comments> elements with the <dc:creator> accessed as "$entry->creator()" (the dc namespace is automatically resolved by assumption) but the other only as "$entry->{'slash:comments'}" (the slash namespace is not resolved). This can become pretty confusing and require a lot of trial and error programming. Why doesn't "$entry->{'dc:creator'}" also return a typed result like "$entry->{'slash:comments'}?

The second is that namespaces are looked up based on the namespace prefix and not the namespace URI itself. Namespace prefixes may be standardised to a vast extent, but there's nothing illegal about using something different. If someone defined an alternate namespace prefix for "slash", existing source code based on using "$entry->{'slash:comments'}" would break. It's a much lesser point but some preemptive namespace registering would help here - there are only so many common extension out there.

Resolving Issues?

My own resolution ended up being an overhaul of how I used Zend_Feed. By adding a new interpretive layer it's possible to create an additional abstract API which both understands RSS and Atom versions, can present commonly requested data (prioritising similar elements) through a single set of accessor methods, and is invulnerable to namespace changes. It also makes internalising conditional fetches and caching simpler and standardised (whether there are meritable features is arguable but I think so). Originally I had considered building on top of Zend_Feed, but I've, to some extent, ended moving towards a parallel path where the base source is the DOMDocument Zend_Feed creates (allowing direct access to XPath and dynamic namespace registration) while maintaining access to Zend_Feed's other methods by proxy.

The point of the new proposal isn't to so much supercede Zend_Feed as extend it's API and capabilities to reduce the level of custom programming Zend_Feed needs to be surrounded with before it's useful in batch processing of RSS/Atom feeds where knowledge of any one specific feed is highly unlikely. I namespace it as Zend_Feed_Reader since it a separate class family not inheriting from Zend_Feed - rather it accepts Zend_Feed_Abstract instances through composition.

Your thoughts are welcome. Even if you write a longer email than me ;).

Best regards,
Paddy
 
Pádraic Brady

http://blog.astrumfutura.com
http://www.patternsforphp.com
OpenID Europe Foundation Member-Subscriber

没有评论: