Skip to content


Searching for Sadness in New York: Is the Foursquare API Living Up to Its Potential?

As explained in this blog post, Foursquare needed a way for its business staff to run reports based on its data without slowing down production servers and without learning technologies such as Scala and MongoDB. The company decided to make its data available to business staff through a Hadoop cluster hosted by Amazon Web Services. Foursquare’s data miners could then query it using Hive, which provides a SQL-like query language for Hadoop.

As a proof-of-concept the company has produced a report on the rudest cities in the world, based on the number of tips that contain profanity. Which is pretty cool (apart from the assumption that profanity use = rudeness). But it makes me realize just how under-utilized geolocation APIs are.

Sponsor

Here are the results of Foursquare’s profanity-mining:

Foursquare rudest cities

And here’s how Foursquare’s data analysis system works:

Foursquare diagram

Some more practical applications, from a business standpoint, for data mining staff might include determining:

Which venues are fakes or duplicates (so we can delete them), what areas of the country are drawn to which kinds of venues (so we can help them promote themselves), and what are the demographics of our users in Belgium (so we can surface useful information)?

Of course, this sort of check-in data is solely in the hands of Foursquare’s internal users. But it makes me wonder whether you could pull together information like this through the Foursquare API if you build your own data warehouse for analysis.

I wonder what services like Fourwhere (which we covered here) could learn by caching all the data retrieved from location various APIs and running sentiment analysis on it. What could MisoTrendy (coverage) tell us about a venue based long-term trend patterns? Is there something in Foursquare’s terms of service that prevents people from doing this? I guess we’re back to that old question what would you do with the massive data sets produced by persistent location tracking?

This feels like it could be the first steps towards accomplishing what was described in the opening lines of the Headmap Manifesto:

there are notes in boxes that are empty

every room has an accessible history

every place has emotional attachments you can open and
save

you can search for sadness in new york

Discuss


Posted in General, Technology, Web.

Tagged with .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.