RedMonk analyst Stephen O’Grady tackles the question “What Factors Justify the Use of Apache Hadoop?” O’Grady cites two of the most common criticisms of Hadoop: 1) Most users don’t actually need to analyze big data 2) MapReduce is more complex than SQL. O’Grady confirms these criticisms, but finds Hadoop useful anyway.
O’Grady acknowledges that volume isn’t the only factor in the complexity of a dataset. “Larger dataset sizes present unique computational challenges,” writes Grady. “But the structure, workload, accessibility and even location of the data may prove equally challenging.”
RedMonk uses Hadoop to analyze both structured and unstructured datasets. There are a number of other tools the firm could use to analyze the data, so why Hadoop? O’Grady responds that datasets companies use aren’t big data yet, but they are growing rapidly.
O’Grady says that RedMonk uses Big Sheets and Hive to work with Hadoop and avoid using Java to write queries.
Cloudera recently published an announcement about how the company Tynt is using Cloudera’s Hadoop distribution. Tynt is a web analytics company that processes over 20 billion viewer events per month – over 20,000 events per second. Prior to adopting Hadoop, Tynt was adding multiple MySQL databases per week to deal with the data.
Another example of a company that’s using Hadoop is Twitter. We covered Twitter’s use of Hadoop here. Twitter needs to use clusters for its data. The amount of data it stores every day is too great to be reliably written to a traditional hard drive. Twitter’s also found that SQL isn’t efficient enough to do analytics at the scale the company needs.
Like RedMonk, Twitter avoids writing Java queries. However, it uses Pig instead of Hive.
Twitter is working with 12 terrabytes of new data per day, significantly more than RedMonk uses. None the less, both companies are making good use of the technology.
How have you used Hadoop? Have you ever found that it was too big for a project that you tackled? If so, what did you end up using instead?
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.