Please help - tuff challenge

martingl · November 29, 2020, 2:11pm

Hi all

ANYONE UP FOR A CHALENGE???

www.dastelefonbuch.de uses some new technology to make it hard for Web Scraper to get the correct data. They use random hidden tags on the data values to manipulate phone numbers etc.

Here is a example result that I like to scrape:

But I get wrong values.

Can anyone help with such example?

Thanks
Martin

leemeng · December 4, 2020, 9:20am

An interesting case and it is definitely an anti-scraper measure. Respect to the site owner for knowing how scrapers work! They even use it intermittently / randomly so that it is hard to diagnose.

To crack it, I would just grab the html block for the phone numbers using Type: HTML and post-process. Quite trivial if you know some programming and regex. It is even possible to post-process the csv manually if you have a text editor which supports regex such as Notepad++ or jEdit. In the example below, I used this regex - <span class="hide">.{1,6}</span>