Randomized Delay Ranges

Would like to be able to set a range for my delay, and have webscraper randomly select it for each page.

Having trouble with a certain website I scrape a lot, so far they have banned 10 of my proxies.

Right now I have my delay set at 7000 / 5000, but would love to have it change randomly

LI or Angels list? Just a guess

To simulate randomization, would you be able to add a new selector looking for some attribute text that may or may not exist, and delay it by say 1000ms? Such text string would depend on the Webpage.

The CSS selctor might be looking for "003" in any the ID attribute for any division, or some HREF that ends in "a.htm", such as:

div[id*='003']
a[href$='a.htm']

This selector should be a sibling of other valid selectors, so that the scrape doesn't get broken if this text is not found in the attributes. The DIV above, if it exists, should contain short text, so as not to clutter up the results.

If you set several of these "contingent" text selectors with different delays, would that simulate some randomization?

I haven't tested this. Just an idea that has been bugging me too. :slight_smile:

Edit: I am not sure whether a null result will still spend the same delay time. Perhaps adding a child under that "contingent" element (perhaps a link or "element" selector) will actually create that delay.

Neither, its a website that lists doctors & various medical entities.

I'm not sure that would work well for my use case.

All the pages follow the same template, the only way I can see this working is if I use regex to extend the delay on certain words but they would certainly see the pattern.

Shoot me the URL. I have a few scrapers I'm playing with. Let me see how long I can go before they block my IP.

I did a video tutorial on scraping doctors/nurses from major hospitals. It was a throwback to article written about about robots.txt to source...

I saved all the outputs and made them available

Based on what you're posting, it's very basic - meant for beginners

(https://www.youtube.com/watch?v=yhG9Pk1ShvY)

Either way, enjoy.

There were earlier versions of WebScraper in collaboration with Jens Willmer that had randomization, for some reason it was removed.

You can add random delay using Tampermonkey extension though.

I've even found someone asking for a delay over stackoverflow here.

I've been thinking about this and you could probably simulate pseudo-randomness by using Element Click and its delay feature.

Say you're scraping New York city company info pages. All the addresses will have zip codes like 10xxx. You could set an Element Click, 10 sec delay only when the zip code is 10001. The selector would be something like:

div.zipcode:contains('10001')