Extraction of data from website imprints

The netEstate Imprint Crawler is able to find the imprint or contact page of a website and extract addresses, contact data and company names.

The Crawler can be used as Web service or via manual batch processing by us and is especially suitable for address verification. It respects the robots.txt, the robots meta element and popular german wording against the processing of contact data.

Emphasis currently is on Germany, Austria and Switzerland. The quality for other countries varies but can be optimized on demand.

Please notice that in order to prevent abuse, we never supply e-mail-addresses – only hashes of e-mail-addresses that can be used for comparison with existing data.

Pricing

Not available for private individuals.

0,015 EUR + VAT per API-call / per crawled websites. The price for additional API-calls/websites halves at 20,000, 100,000, 500,000 API-calls/websites.

You buy a quota of calls for the Web-API (One HTTP-call per website, minimum purchase 6000 calls) or send us a file with websites to crawl. We will then send the results back as CSV file (service fee 60,- EUR + VAT, no minimum purchase).

Recent changes

  • October 2022: Improved handling of Javascript-based sites and composite location names
  • February 2021: Recognition of academic titles. Improved recognition of contact persons
  • August 2020: Several improvements – especially concerning company names
  • May 2020: Crawling of the Swiss company ID (UID)
  • February 2020: Improved recognition of contact persons

Details

The crawler will extract zip code and town from ca. 77% of German company websites. Social links from the homepage are found on ca. 37% of company websites. Once zip code and town have been found, the probabilities for other data are:

  • ca. 94% address
  • ca. 87% phone
  • ca. 84% VAT ID
  • ca. 82% Name (company or person)
  • ca. 77% SHA-1-Hash of ‘mailto:’ + mail address
  • ca. 72% Fax
  • ca. 68% Contact person
  • ca. 36% trade register number and town of register court
  • ca. 8% Bank code
  • ca. 8% BIC
  • ca. 7% IBAN

We have crawled the german Web with the Imprint Crawler and can offer you selections from the resulting database (see website database).

You can test the Imprint Crawler with this form:

Website to crawl (without http://):


Security question:

Result format:

ignore noindex robots meta element (robots.txt is always obeyed)

 

The crawler may return several results and several contact persons within a result. All results are sorted by descending relevance. You can simply use the first result returned.

To prevent excessive crawling, all crawled pages are cached by us for up to 24 hours.

You contact person at netEstate:
Michael Brunnbauer
Phone: +49 89 32197780
E-Mail: info@netestate.de