Timeout Error when using SelectorElementScroll

jkma · May 4, 2020, 11:17am

Hi,

I am trying to scrape amazon reviews from "Amazon's Top Customer Reviewers". Therefore, I would like to scrape the URLs of the reviews for each reviewer in a first step.

Everything works fine, unless I try to scrape more than approximately 500 review URLs per profile. After about 15 minutes of scrolling through the Amazon profile I receive the following error and no data is scraped at all.

background_script.js:465 {"url":"https://www.amazon.com/gp/profile/amzn1.account.XXXXXEYLRJ5U5FZIYBLZKXYXXXX","parentSelector":"reviewer_urls_as_input","sitemapName":"b_review_urls_02","driver":"chrometab","error":"timeout: Job execution timeout","stack":"Error: timeout: Job execution timeout\n at chrome-extension://jnhgnonknehpejjnehehllkliplmbmhn/background_script.js:541:27","timestamp":1588587909,"level_name":"ERROR","message":"Job execution failed"}

I already increased the delay for the SelectorElementScroll, page load, and request interval significantly. Is there any way to scrape the data and avoid this timeout error?

I would really appreciate your help!

Sitemap (the link to pastelink.net just contains 3 links to amazon profiles)

{"_id":"b_review_urls_02","startUrl":["https://pastelink.net/XXXX"],"selectors":[{"id":"reviewer_urls_as_input","type":"SelectorLink","parentSelectors":["_root"],"selector":".body-display a","multiple":true,"delay":0},

{"id":"scroll","type":"SelectorElementScroll","parentSelectors":["reviewer_urls_as_input"],"selector":"div.desktop:nth-of-type(-n+610) .a-size-base span span","multiple":true,"delay":"20000"},

{"id":"profile","type":"SelectorElement","parentSelectors":["reviewer_urls_as_input"],"selector":".profile-at-card div.a-row:nth-of-type(2)","multiple":true,"delay":0},

{"id":"name","type":"SelectorText","parentSelectors":["profile"],"selector":"span.a-profile-name","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["profile"],"selector":"span.a-profile-descriptor","multiple":false,"regex":"","delay":0},

{"id":"review_url","type":"SelectorElementAttribute","parentSelectors":["profile"],"selector":".a-section > a","multiple":false,"extractAttribute":"href","delay":0}]}

smashao003 · December 6, 2021, 1:23pm

Hi,

I have a similar issue with one of my sitemaps. My process is running longer than 15 minutes in one url and therefore not returning any results after it is done.

Did you manage to solve your issue? And if so how did you manage to do it?

Regards

Sizwe Mashao

jkma · December 6, 2021, 2:25pm

As this issue only occurred for ~30 links I did it manually.

In my case the issue was that after scrolling through Amazon for some time, Amazon servers respond quite slow and the response time increased to several minutes.

Therefore, webscraper showed the time out error.

Doing it manually: open respective URL in Mozilla, use any auto scrolldown plug-in for your browser to load the entire page. After entire page is loaded use webscraper to scrape it.

For a large number of links this is not optimal, but I could not find any better solution.

ViestursWS · December 6, 2021, 3:25pm

@jkma @smashao003 Hello. Yes, there might be a slight chance that the targeted website fails to load new elements in time, however, zooming out the page as far as you can definitely help to improve the situation. There's also a possibility to limit the scroll down to some particular element using the jQuery selector -:nth-of-type() - Selectors | jQuery API Documentation