Sitesearch developers FAQ

If you have a question about OI Site Search, please take a look at our frequently asked questions below. If you can't find your question, please contact us!

General questions

  • Sure, this is no problem, we can deal with millions of URL's but it'll take a while for us to download. We can make special arrangements to increase download speed to 50 pages per second or even more but it depends on your server capacity. Please contact us to if your site is very large.

  • We minimize downtime by operating two large clusters of machines that serve the search index. About half the servers can be shut down without disrupting the service. In case of trouble in one data center we can just shut it down without down time.

  • Yes. If you allow our crawler access to your site. You can secure the search page by configuring leases on search pages secured by a HMAC-SHA1 shared secret. By providing an expiration time, the user can never access the search after the expiration time. See the lease and HMAC configuration options for more information.

Crawler questions

  • Our crawlers can detect content in over 50 different languages with high precision. We can detect content of the following languages: Afrikaans, العربية, Български, বাংলা, Česky, Dansk, Deutsch, Ελληνικά, English, Español, Eesti, فارسی, Suomi, Français, Frysk, ગુજરાતી, עברית, हिन्दी, Hrvatski, Magyar, Bahasa Indonesia, Íslenska, Italiano, 日本語, ಕನ್ನಡ, 한국어, Lietuvių, Latviešu, Македонски, മലയാളം, मराठी, नेपाली, Nederlands, Norsk (bokmål), ਪੰਜਾਬੀ, Polski, Português, Română, Pусский, Slovenčina, Slovenščina, Soomaaliga, Shqip, Svenska, Kiswahili, தமிழ், తెలుగు, ไทย, Tagalog, Türkçe, Українська, اردو, Tiếng Việt, 中文 and 國語. If your language is not listed and you need to have pages detected for that language please contact us.

  • Yes, we can now partially extract relevant microdata items such as an image, the bread crumb, a product's price, user downloads, comments and more. With support for microdata implemented we can provide interesting information on search results to your users such as the price of a web shop product, review rating or event events. We recommend websites to implement the Schema.org vocabulary.

  • We crawl important pages of your website every 30 to 120 minutes. Please contact us if you want us to crawl a specific URL even more regular.

  • No, not yet.

  • Yes, we fully support canonical URL's. You can either use the link element in the HTML head:
    <link rel="canonical" href="http://www.oi.com/"/>
    or specify the canonical URL in the HTTP header in case of media files such as PDF files: Link:
    <http://www.example.com/>; rel="canonical">

  • Four to six days after the first time we discovered it. If a page changes between visits we'll reduce the interval and crawl it more frequently, if it hasn't changed less frequently.

  • Yes. We make an effort to extract dates in most common formats or languages from your pages. This date is used to calculate the age of a document relative to other document, and can be displayed in the result snippet. Some very exotic formats are not supported. We also read dates from one of the properties of the creative work microdata schema, OpenGraph article:published_time and article:modified_time are also supported. Make sure you use the correct date format as described, including correct timezone information is encouraged.

  • You can use robots.txt to deny our crawler access to some pages or directories. We also obey robots meta tags in the HTML with which you can tell us not to index (NOINDEX) or not to follow (NOFOLLOW) a specific page. The NOFOLLOW and NOINDEX values can be combined. We recommend to put a NOINDEX robots meta tag on overview pages, directory listings, news archives and similar type of pages.

  • This works the same as preventing a page from being indexed. If you disallow the page or directory in your robots.txt or via the robots meta tag we will remove the page from the index the next time we visit it. If you want a page to be removed as soon as possible you can contact us so we can schedule removal of the page. We're working on an API with which you can submit a list of URL's that are scheduled to be crawled as soon as possible.

  • We support web pages (HTML and XHTML) and PDF documents. Support for file types such as spreadsheets and other office documents can be added on request. Support for multimedia such as audio, video and images is not planned.

  • Yes, we have partial support for image extraction. We can extract a page's image from the og:image tag or from a microdata image item. We're also working on improving full automatic extraction of a relevant image from the main content. This feature is enabled by default. See the showImage API call for more information.

  • If, for some reason, a title tag is missing we try to use a h1 or h2 element instead. If it still doens't work we can use a file name or even the first few words from the content.

  • We calculate the likelihood of a given page for being child friendly and support detection in both English and Dutch. Detection of pornographic material is quite accurate. Please contact us if you need additional languages to be supported.

  • Yes, we provide a common mechanism with which webmasters can enable us to crawl their AJAX URL's. There is a getting started guide on Google to help you make your AJAX web application crawlable.

Search engine questions

  • Our search index can properly analyze and deal with a variety of languages. The following languages are supported: languages: Deutsch, English, Español, 日本語, Français, Nederlands, Português and 中文. If your language is not listed and you need to have pages detected for that language please contact us.

  • We support compounds for most Dutch words. Support for other languages such as german can be added.

  • Yes. Users can use the + (plus) and - (minus) signs to indicate the following term being mandatory or missing in the documents. The following query: the +quick -brown fox will return documents with quick, without brown and maybe with fox.

  • No, not at this time but we are working on a proper location aware system. We can only accurately detect in which country your user currently is.

  • We don't treat the spelling differences in any special way. Differences between the languages can be corrected as spelling errors depending on the spelling used on your website and the search terms used by your users. We have not yet planned American/Britisch spelling normalisation.

  • Yes! We provide a search suggestion autocomplete widget. This widget uses previous search requests for suggestions so it takes some time for it to populate with enough useful suggestions.

  • Yes! We provide a related query widget. This widget shows related queries for the current search query.

  • No, not at this time.

  • Yes. You can set a list of languages to be preferred in the search results or let your user's current geographical location or browser preferences decide. See the openindex.lang.pref configuration option.

  • No. The search results are sorted by relevance only.

  • Yes. See the openindex.result.icons configuration option. It allows a flexible configuration. If you want results in some language or from some host to have a different icon you can override the file type directives by adding new directives for your field on top of them. The example below will display a shopping cart icon for results from shop.example.org:

    <"host:shop.example.org" : "http://www.example.org/img/icons/cart.png">

  • Yes. You can instruct the search engine to prefer a given file type. See the openindex.type.pref configuration option.

  • Yes. You can instruct the search engine to prefer a given category. See the openindex.cat.pref configuration option.

  • Yes. If the metadata tag with the name cat is added to the index and can be used as facet and filtering. You can, for example, use it to define specific sections or categories of your website.

    <meta name="cat" value="section of your website"/>

  • No, not at this time. Please contact us if you want to provide your users with the ability to execute fielded queries and do not want to rely on faceting to do the job.

  • Yes. You can preset a filter for all available facetting widgets. See for example the openindex.type.restrict configuration option. Users will still be able to select another value for the restricted field so it's usually best to hide the restricted widget.

  • Search results are deduplicated by default.

  • Yes, you can use our search service on TLS/SSL secured pages, all search queries, shown images and clicked outlinks will be using secure communication. No configuration option is required, the engine detects when it is loaded on a secure page. Do not forget to load our Javascript engine via TLS as well, otherwise modern browsers may block mixed content. See also the openindex.tls configuration option.

  • We will only ever return a fixed number of results for a query, even if there are many more results. Results further than 12 pages from the start are not very relevant.

  • Yes sure, see the openindex.pager.rows configuration option. We still only return 12 pages maximum, regardless of number of results per page.

  • There are four distinct states the engine can be in, not counting the initial state before any request is executed. The following four states are possible:

    - normal result set and no spell checker suggestion
    - result set but with spell checker suggestion
    - no results but an available spell checker suggestion is automatically executed
    - no results and no spell checker suggestion

  • Yes. You can instruct the search engine to which safe search policy to use. openindex.safe configuration option.

  • Yes. Using external functions you can hook in your own custom Javascript code and place your ad somewhere in the search results page. Each result is located in a list item element with a numbered ID so you can look up the position you want and inject your ad in the list.

  • Yes. You can instruct the search engine to call a custom Javascript function before and/or after a search request. Please see the openindex.preCallback and openindex.postCallback configuration options.

  • Yes. Callbacks are available for the keyDown and the keyUp events on the main input field. Please see the openindex.text.keydownCallback and openindex.text.keyupCallback configuration options.

  • Yes you can.

Contact us
Please feel free to contact us now! Call 0(031) 50 85 36 620, send an e-mail or go to our contact page.
CONTACT US