Request: Add 'Pause Scraping' option after N requests

I wish to scrape a site that uses a CDN gateway to ban IP addresses that make too many requests.

I can make up to 200 requests, and then I have to pause for 90 minutes before making another request. It will ONLY clear my odometer if there is a complete 90 minute pause. If I set the request delay to 30 seconds or 60 seconds or even 60 minutes, it will still ban me after 200 requests, because there wasn't a complete no-contact 90 minute deprivation period to reset my odometer back to zero.

3 Likes

bump.

Need to be able to add a Sleep period after every batch of N pages scraped, or I get IP banned for 24 hours.

You can configure scheduled tasks to suspend the computer and reactivate,
or from another computer suspend it with PsShutdown (Sysinternals) and turn on by wake on lan. Only need the time need for <200 requests, not is perfect and exactly but work, the ws continue without problems. Too use specific programs...
Or use Schedule Internet conecction avaliable, with router for example or other multiple options...

Extension proxy rotator, BP Proxy Switcher and other, autochange proxy every 60 seconds... etc

You can add element with delay to avoid ban.

Other request It could be the opposite an option, which will avoid to suspending the computer while working :slight_smile:

Ok is more elegant ws would implement query manager.

That simply would not work, and sounds like a very hair-brained idea to try. Not to mention it would prevent me from using the computer / internet while scraping.

A more elegant solution would be to add a Sleep timer after every batch of, ie, 200 page requests. Sleep is a built in function of every programming language. It would be trivial for the developer to implement.

If you have already divided the URLs in batches of 200, you could use the Cloud Scraper, which has a scheduler feature. You can upload a number of sitemaps and schedule them to avoid the timeout.