Ends the Scourge of Data Scraping

While attending Structure in San Francisco recently I was briefed by about there solution. During the event I probably did a dozen briefings but really stood out to me. What they’re doing is in really fact pretty simple but, like all good solutions, the simplicity at the front end hides a high degree of complexity at the back end.

So what is Essentially is a data tool that allows users to consider the entire web as one big database. We all implicitly know that the web is the biggest source of data on the planet, but have always struggled to find manageable ways to enable the harnessing of that data. Customizations, specificities and nuances – all the things that make the web the highly flexible place that it is – also make it exceptionally hard to find some common structure across multiple data sources o the web.

This is where comes in – The service allows users to harness the data by removing the complexity that differences of data structure create. Data can be extracted from multiple websites, normalized and combined to create a single dynamic database which can then be manipulated to create visualizations and other outputs.

Traditionally this sort of process required specific screen-scraping technologies to be built – the problem with this is that it requires developer time, is specific from site to site, and breaks every time website structure changes. With data acquisition from the web is  business analyst rather than a developer role. The company has several distinct offerings:

  • Data Browser (io) – a browser with powerful additional features that allow users to transform the sites they visit into an actionable dataset
  • Cloud Data Platform – the cloud backend powers the federation and extraction, making it accessible via the io browser or the io API
  • Universal Access & Federation – Tools that ensure data published on the web is accessible – dynamic pages, forms & APIs are all easily consumed

In my briefing I was shown how can aggregate data from a number of different real estate websites, normalize that data and populate it into a common database – but the potential user cases are many. According to the company, has already gained traction with companies such as international recruiting company Robert Half, Bloomberg and Hewlett Packard who all use to aggregate data from the web and present it to users in an actionable format.

There is a certain poetry to a solution that, unlike the hand waving and rhetoric around the death of structured data, takes a mass of data from the web and delivers it up in a traditional columnar format. I really like what is doing and am certainly looking forward to seeing how they develop the commercial applications for their product.

  • Alex Salkever |

    Ben – I tried using it several times and found it to be slow, buggy, and was never able to get results out. I’m curious as to what they recommend for people using it on laptops. I’d love to see what I can do to make it work better but it was unusable for me. Others have confirmed my assessment.

  • Update on my previous comment: I have continued to use and it has improved immensely. It is much faster and easier to use and has added a bunch of features which I really love (table extraction, for example). The support team has also been awesome. In fact, I use it all the time and really enjoy it.

Leave a Reply