This week, Boing Boing posted its entire 11 year archive (63,999 posts) in XML format. But Nicholas H.Tollervey from FluidDB wanted more.” XML is good, but having a searchable database of posts is better,” he writes on the FluidDB blog. So he ported Boing Boing’s XML archive into FluidDB.
“Because of FluidDB’s open nature anyone can now make use of boingboing’s data via a few simple and easy to construct RESTful calls to FluidDB,” he writes. In other words, FluidDB is hosting a Boing Boing API. For free.
The cool thing – apart from being able to use FluidDB to mine BB for interesting data – is that you can do this yourself with your own blog.
Here’s a bit more about how it works:
Furthermore, while I was cleaning/preparing the data for upload I made sure to extract every domain name and URL referenced in each post and annotate the publication date as computer friendly values rather than just a human readable date.
An instant win is the ability to query data. For example, you’ll be able to search for all posts that link to techcrunch.com written in 2010 by Cory Doctorow. This is how to write the query in FluidDB’s super simple query language:
boingboing.net/domains contains "techcrunch.com" and
boingboing.net/year = 2010 and
boingboing.net/author = "Cory Doctorow"
Why would you want to do this?
This brings me to the killer point: accessing data from boingboing.net is good, but the facility to annotate, discover and re-use everyone’s data about boingboing.net posts is better. That’s why we sometimes say we’re trying to do to databases what Wikipedia did to encyclopaedias.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.