Artificial Intelligence is becoming more and more important to add meaning to big data.
At Openindex we apply machine learning as a means to streamline and improve the quality of (crawled) data. Advanced algorithms can extract relevant information from massive amounts of data, or recognize very specific items.
AI can be applied with different goals. A few examples: one of the goals is to use the intelligent algorithms to filter the data, resulting in less overhead in computational and storage costs. It can also serve to classify elements of data. This results in extra information, telling something about the topic, structure and/or function of a document. Last but not least, AI can recognize textual patterns to make a semantic analysis, meaning that text is no longer treated as "just" a set of words, but as words with meaning expressing abstract concepts, referring to objects and topics in the real world.
The sky is the limit
In practice intelligent software can be deployed to filter unwanted content, extract keywords, build a semantic index, detect spider traps, and much, much more. The sky is the limit!
See things in action
Available right now are our data extraction tool that uses machine learning algorithms to classify web page types such as overviews or hubs, versus forum threads, article or product pages. It's also uses machine learning to decide which text extraction algorithm should be used.
Our spider trap detection tool is also online! It helps us remove rubbish from search engine index and keep the databases smaller than they otherwise would be.