Web Data extraction

The web provides us with a wide variety of completely unstructured and chaotic HTML pages. Our HTML data extraction software deals with this problem in a generic way.

This tool displays information we extract from given web pages. It uses the HTML parsing, extracting and classification technology we use for our Sitesearch service, and it used by some of our customers. Try it and if you think we got it wrong, contact support so we can fix it.

Some examples are:

The fields displayed can be detected language, extracted article date, extracted main text and an optional image.

It can also extract currency and prices for webshop products, but also international phone numbers, e-mail addresses and acronyms embedded in the text. And even determine whether a given HTML page is either an article or a wiki, a homepage, a forum thread or a webshop product and more.

Contact us
Please feel free to contact us now! Call 0(031) 50 85 36 620, send an e-mail or go to our contact page.