As an Apache Nutch Committer Openindex is an expert in web crawling. We can assist on setting up a Nutch web crawler service that fits your needs, or we do the crawling for you and provide you with the data you need. In any case we can setup a single machine or a cluster, using Hadoop, depending on the scale that needs to be crawled.
Feed your search engine
a Web crawler is often being used to deliver data to your search engine. We can help you set up a crawler that feeds your search engine and tackles all the difficulties that come along: content extraction, crawler traps, duplicates, etc.
Using Apache Solr/Lucene as a search engine, we are able to offer a crawler based custom open source search solution for your website, intranet or Document Management System.
We can provide you with a crawler that collects data from the internet. For example we can provide you with a list of domains that use a certain CMS, that contain certain words or content or a certain widget. These data sets can be very useful for e.g. research or sales leads.
We can provide you with a scraper that collects specific data from specific websites. This is a great solution if you would like to obtain, for example, all product descriptions from a certain (set of) webshop(s) on a regular base.
Data as a service
Openindex is happy to do the crawling or scraping for you. In this case, we provide you with the data you need, in the format you like, on a regular base or one-time.