Imprint Crawler

Imprint Crawler 2016-11-25T03:05:13+00:00

Extraction of data from website imprints

The netEstate Imprint Crawler is able to find the imprint or contact page of a website and extract addresses, contact data and company names.

The Crawler can be used as Web service or via manual batch processing by us and is especially suitable for address verification. It respects the robots.txt, the robots meta element and popular german wording against the processing of contact data.

Emphasis currently is on Germany, Austria and Switzerland. The quality for other countries varies but can be optimized on demand.

Please notice that in order to prevent abuse, we never supply e-mail-addresses – only hashes of e-mail-addresses that can be used for comparison with existing data.

Pricing

Not available for private individuals.

0,015 EUR + VAT per API-call / per crawled websites. The price for additional API-calls/websites halves at 20,000, 100,000, 500,000 API-calls/websites.

You buy a quota of calls for the Web-API (One HTTP-call per website, minimum purchase 6000 calls) or send us a file with websites to crawl. We will then send the results back as CSV file (service fee 90,- EUR + VAT, no minimum purchase).

Recent changes

  • February 2015: Enhanced address recognition
  • March 2015: Recognition of natural persons as operators, dramatically enhanced recognition of contact persons
  • September 2015: Enhanced recognition of VAT IDs
  • March 2016: Optional Javascript rendering, better handling of Frames and Iframes

Details

The crawler will extract zip code and town from ca. 77% of German company websites. Social links from the homepage are found on ca. 37% of company websites. Once zip code and town have been found, the probabilities for other data are:

  • ca. 94% address
  • ca. 87% phone
  • ca. 84% VAT ID
  • ca. 82% Name (company or person)
  • ca. 77% SHA-1-Hash of ‘mailto:’ + mail address
  • ca. 72% Fax
  • ca. 68% Contact person
  • ca. 36% trade register number and town of register court
  • ca. 8% Bank code
  • ca. 8% BIC
  • ca. 7% IBAN

We have crawled the german Web with the Imprint Crawler and can offer you selections from the resulting database (see website database).

You can test the Imprint Crawler with this form:

Website to crawl (without http://):

 

Security question:
Result format:

 

The crawler may return several results and several contact persons within a result. All results are sorted by descending relevance. You can simply use the first result returned.

To prevent excessive crawling, all crawled pages are cached by us for up to 24 hours.

You contact person at netEstate:
Michael Brunnbauer
Phone: +49 89 32197780
E-Mail: info@netestate.de