We are building a new website at work using Drupal 7, and one of the requirements is integration of work’s Facebook and Twitter streams. The easiest way to do this, of course, is with the Drupal Aggregator and the RSS/Atom feeds both Facebook and Twitter provide.
The problem is, while Twitter’s RSS/Atom feed is standards compliant, Facebook’s is not, by any stretch. The main problem for us being, Facebook provides relative URLs linking to posts rather than Absolute URLs, as required by the RSS standard. The relevant part being here:
RSS places restrictions on the first non-whitespace characters of the data in <link> and <url> elements. The data in these elements must begin with an IANA-registered URI scheme, such as http://, https://, news://, mailto: and ftp://. Prior to RSS 2.0, the specification only allowed http:// and ftp://, however, in practice other URI schemes were in use by content developers and supported by aggregators. Aggregators may have limits on the URI schemes they support. Content developers should not assume that all aggregators support all schemes.
Others have also had this problem (the problem being URLs that should be “http://www.facebook.com/USER/post/xxxxxxx” turn into “http://www.mysite.com/USER/post/xxxxxxx”), and rather than resign to the fact that it’s just not going to work, I decided to spend some time to actually make it work for me.
While I could probably submit a patch for the Drupal aggregator that deals with FB, it seems like FB should just obey the standard (since it’s a publish standard, and there’s a free validator). Hell, using those two things a year ago, I wrote an RSS generator from scratch (knowing nearly nothing about RSS structure other than it’s XML) in an afternoon that validates perfectly.
Anyway, rather than supplying a patch, I’ve done two things. One, I sent a “suggestion” to Facebook to actually fix their RSS feed, and two, I wrote up a simple PHP script that takes the RSS feed, and runs a preg_replace against it to fix all the relative URLs (there are other issues with their RSS code, but the only one that I care about is the URL structure). I posted the code to the Drupal page linked above, but I thought I’d also share it here.
Replace the “FEEDURL” with the URL of your RSS feed (with http:// rather than feed://) and the “USERNAME” with your FB username found in the URLs. Save this file as “fixrss.php” somewhere on a server with PHP enabled (probably the same server you’re using for Drupal) Then, simply point your RSS aggregator at this PHP file, and you should get valid links from the aggregated feed that link back to the original FB posts.
I realize I could make this script a lot more generic by having it take input from GET variables, but that’s just asking to be used for nefarious purposes… so I’d rather leave everything hardcoded and simple. If I need to fix some other feed, then I’ll think about adding a GET variable for which feed I’m after, and still hardcode them. But that will have to wait until when/if that is needed.
Good luck! And if you want to really help, “suggest” Facebook fix their crappy implementation of RSS/Atom. Try yourself to validate your Facebook feed with the W3C Feed validator.