netEstate has years of experience programming crawlers and search engines.
List of search engines operated by netEstate
Our Imprint Crawler extracts data from the imprint of websites.
Our Job Crawler finds job ads on a website.
Training your own text classifiers and sharing them with others was possible with our discontinued Web service textclassify.com.
The crawler of our site search extracts metadata such as HTTP status code, file type, language, modification time, title, meta tags and the whole text content. Besides the use of this data for on site search, you can use it to generate site maps or export it as XML.
For our customers, we have written Web crawlers that extract structured data of various kinds. The data can be extracted from specific websites or the Web in general.
In the latter case, the data does not have a fixed format or position on the website/page. We try to split information and noise using syntactic and semantic attributes. Cookies and forms are no obstacle for us.