Skip to content


New Startup Analyzes 100,000 Web Pages With a Snap of Your Fingers

extractivlogo.jpgMachine processing of large quantities of unstructured text, to discover media mentions, relationships between entities and sentiment analysis need not be priced out of the range of the everyday web lover or small business.

Tonight two Texas companies announced a collaboration that brings exactly that to market, at a disruptively low price. Web crawling service 80Legs and Natural Language Processing service Language Computer Corporation have combined their efforts to create Extractiv, a web crawling and semantic analysis service offered at an affordable price. I’ve already put it to use to perform some awesome bulk text analysis for my own work.

Sponsor

extractivscreen.jpg

Above: Extractiv correctly identified the people, places and dates in my article today about Jay Adelson’s new job. It only misidentified one geek as an athlete, not bad. Picture this analysis spread over hundreds of thousands or millions of documents and you are, as they say, cooking with gas.

Testing the Tool

To test Extractive, I gave the company a collection of more than 500 web domains for the top geolocation blogs online and asked its technology to sort for all appearances of the word “ESRI.” (The name of the leading vendor in the geolocation market.)

The resulting output included structured cells describing some person, place or thing, some type of relationship it had with the word ESRI and the URL where the words appeared together. It was thus sortable and ready for my analysis.

The task was partially completed before being rate limited due to my submitting so many links from the same domain. More than 125,000 pages were analyzed, 762 documents were found that included my keyword ESRI and about 400 relations were discovered (including duplicates). What kinds of patterns of relations will I discover by sorting all this data in a spreadsheet or otherwise? I can’t wait to find out.

That work took the machine about an hour and would have cost me less than $1, after a $99 monthly subscription fee. The next level of subscription would have been performed faster and with more simultaneous processes running at a base rate of $250 per month.

The machine isn’t perfect – but it looks very impressive for having just launched this evening. Would I use Extractiv for my bulk text analysis again in the future? Of course I would, in fact I intend to start thinking about what text I’d like analyzed next immediately.

This sort of service represents an incredible vision of the future: commodity level, DIY analysis of bulk data produced by user generated or other content, sortable for pattern detection and soon, Extractiv says, sentiment analysis.

The People Behind the Technology

80Legs is lead by CEO Shion Deysarkar, a former oil industry computer scientist turned social network data hacking entrepreuer whom we profiled this Spring. (Thoughts From the Man Who Would Sell The World, Nicely) Deysarkar and 80Legs CTO Toan Duong describe themselves online as employed by Creeris Ventures, a Houston venture capital firm with a diverse portfolio including grid computing, jet airplanes and litigation.

The Extractiv collaborators Language Learning Corporation include John Lehmann, CEO at LLC since September and Project Manager at Extractiv, as well as computer scientists Dr. Finley Lacatusu and John Williams.


Discuss


Posted in General, Technology, Web.

Tagged with .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.