Scraping delays only handled for first page

I am scraping multiple pages in a site where there is a delay to complete loading the fields I am scraping.

The sitemap involves iterating through a number of select fields to change the data - each change of select option loads a new page.

On running the scrape, I specify a longer delay for both the Request Interval and the Page Load Delay - setting both to 10000 to allow the data to complete loading so it can be scraped.

This works fine on loading the first page, but subsequent pages appear to only delay for the default 2000 (or actually less) which isn't long enough for the data to load, leading to many null values being recorded.

@dunxd Hi, are you able to share the targeted website?

I can't - the site holds monitoring data for our networks, so not publicly accessible nor appropriate for me to share.

There is a drop down for selecting each location, which loads a different version of the page when it is changed. The sitemap works perfectly for changing the page, and there are times when all the content loads very fast and the data is collected fine. However, sometimes the content doesn't finish loading before the page gets changed again. The delays definitely work for the first page load, but not for any of the subsequent ones. If I set the delay high (like 10 seconds) it is immediately noticeable.

It's possible that second page is not waiting for page load delay because network status error occurred while loading the page. You can check error messages by following these steps:

  1. Open chrome://extensions/ or go to manage extensions
  2. Enable “developer mode” at the top right
  3. Open Web Scrapers “background page”
  4. A new popup window should appear.
  5. Go to “Console” tab.
  6. Run your sitemap from your main window and wait for error messages.

Hard to diagnose without sitemap, code or Url.

But if you're navigating the pages by using Element Click, then Request Interval and Page Load Delay would not work. These are only used for a href (standard) links, so doesn't matter what value you set. For Element Click, the click delay is what you have to tweak.

image

Thank you - that is exactly what I needed.

I hadn't noticed the Delay setting under the Element Click.

My scrape is working great now!!! Very happy to have learnt about this.