Extraction of data from website imprints
The netEstate Imprint Crawler is able to find the imprint or contact page of a website and extract addresses, contact data and company names.
The Crawler can be used as Web service or via manual batch processing by us and is especially suitable for address verification. It respects the robots.txt, the robots meta element and popular german wording against the processing of contact data.
Emphasis currently is on Germany, Austria and Switzerland. The quality for other countries varies but can be optimized on demand.
Please notice that in order to prevent abuse, we never supply e-mail-addresses – only hashes of e-mail-addresses that can be used for comparison with existing data.
Not available for private individuals.
0,015 EUR + VAT per API-call / per crawled websites. The price for additional API-calls/websites halves at 20,000, 100,000, 500,000 API-calls/websites.
You buy a quota of calls for the Web-API (One HTTP-call per website, minimum purchase 6000 calls) or send us a file with websites to crawl. We will then send the results back as CSV file (service fee 90,- EUR + VAT, no minimum purchase).
- February 2015: Enhanced address recognition
- March 2015: Recognition of natural persons as operators, dramatically enhanced recognition of contact persons
- September 2015: Enhanced recognition of VAT IDs
The crawler will extract zip code and town from ca. 77% of German company websites. Social links from the homepage are found on ca. 37% of company websites. Once zip code and town have been found, the probabilities for other data are:
- ca. 94% address
- ca. 87% phone
- ca. 84% VAT ID
- ca. 82% Name (company or person)
- ca. 77% SHA-1-Hash of ‘mailto:’ + mail address
- ca. 72% Fax
- ca. 68% Contact person
- ca. 36% trade register number and town of register court
- ca. 8% Bank code
- ca. 8% BIC
- ca. 7% IBAN
We have crawled the german Web with the Imprint Crawler and can offer you selections from the resulting database (see website database).
You can test the Imprint Crawler with this form:
The crawler may return several results and several contact persons within a result. All results are sorted by descending relevance. You can simply use the first result returned.
To prevent excessive crawling, all crawled pages are cached by us for up to 24 hours.
You contact person at netEstate:
Phone: +49 89 32197780