Failing Amazon Scraping

I have been running scrapes using the Cloud version of webscraper very successfully for months now. I have scraped all FLIR branded product, daily, from www.amazon.co.uk, www.amazon.de, www.amazon.es, www.amazon.fr and www.amazon.it, since 6th September 2018.

Scraping these sites results in around 2000 product scrapes per day. I schedule the jobs so they run around 15 minutes apart starting at 02:00 in the morning, each day.

Almost exactly 2 weeks ago, the schedule started to fail badly across all 5 Amazon sites. In some cases it fails completely. In other cases it scraped only a small fraction of the FLIR product I KNOW is sold on the Amazon sites, and which I am used to successfully scrape.

Successful scraps from each site for the last 3 weeks have been:

Days .co.uk .de .es .fr .it
-1 517 467 452 340 428
-2 360 0 1 5 3
-3 27 0 2 128 15
-4 556 505 464 350 438
-5 281 2 98 349 34
-6 545 79 16 1 191
-7 18 1 1 0 249
-8 41 36 0 2 8
-9 453 121 44 9 1
-10 176 2 42 1 28
-11 528 461 408 8 3
-12 142 0 174 102 80
-13 535 510 465 162 281
-14 314 2 14 1 429
-15 521 473 464 3 24
-16 527 478 434 167 419
-17 533 487 457 144 26
-18 517 483 437 167 434
-19 550 486 447 173 297
-20 555 487 459 168 428
-21 535 506 452 167 433

As you can see www.amazon.fr was always a bit of an issue, despite there being fundamentally a similar number of pages of FLIR product, I was never able to tweak the scraper to collect more than around 170, although oddly, that has changed recently, as you can see.

When I saw the yesterday's scrape (Day -1) had failed in a major way, I deleted the files and re-ran them manually. As you can see, having done this manual rerun yesterday, I got normal looking figures (the figures presented on row -1). I also reran manually last Friday (-4 days ago), which is why that looks normal too.

But between days -14 and -2, it has been very, very haphazard.

Anyone got any clue why? The code used across each site is broadly the same. I replicate the www.amazon.co.uk code below for reference:

{"_id":"amazon-flir-co-uk","startUrl":["https://www.amazon.co.uk/s/ref=sr_pg_1?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&keywords=FLIR&ie=UTF8&qid=1537784570","https://www.amazon.co.uk/s/ref=sr_pg_2?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=2&keywords=FLIR&ie=UTF8&qid=1537784245","https://www.amazon.co.uk/s/ref=sr_pg_3?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=3&keywords=FLIR&ie=UTF8&qid=1537784326","https://www.amazon.co.uk/s/ref=sr_pg_4?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=4&keywords=FLIR&ie=UTF8&qid=1537784335","https://www.amazon.co.uk/s/ref=sr_pg_5?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=5&keywords=FLIR&ie=UTF8&qid=1537784342","https://www.amazon.co.uk/s/ref=sr_pg_6?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=6&keywords=FLIR&ie=UTF8&qid=1537784504","https://www.amazon.co.uk/s/ref=sr_pg_7?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=7&keywords=FLIR&ie=UTF8&qid=1537784519","https://www.amazon.co.uk/s/ref=sr_pg_8?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=8&keywords=FLIR&ie=UTF8&qid=1537784524","https://www.amazon.co.uk/s/ref=sr_pg_9?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=9&keywords=FLIR&ie=UTF8&qid=1537784527","https://www.amazon.co.uk/s/ref=sr_pg_10?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=10&keywords=FLIR&ie=UTF8&qid=1537784532","https://www.amazon.co.uk/s/ref=sr_pg_11?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=11&keywords=FLIR&ie=UTF8&qid=1537784535","https://www.amazon.co.uk/s/ref=sr_pg_12?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=12&keywords=FLIR&ie=UTF8&qid=1537784539","https://www.amazon.co.uk/s/ref=sr_pg_13?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=13&keywords=FLIR&ie=UTF8&qid=1537784545","https://www.amazon.co.uk/s/ref=sr_pg_14?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=14&keywords=FLIR&ie=UTF8&qid=1537784549","https://www.amazon.co.uk/s/ref=sr_pg_15?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=15&keywords=FLIR&ie=UTF8&qid=1537784553","https://www.amazon.co.uk/s/ref=sr_pg_16?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=16&keywords=FLIR&ie=UTF8&qid=1537784558","https://www.amazon.co.uk/s/ref=sr_pg_17?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=17&keywords=FLIR&ie=UTF8&qid=1537784562","https://www.amazon.co.uk/s/ref=sr_pg_18?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=18&keywords=FLIR&ie=UTF8&qid=1537784577","https://www.amazon.co.uk/s/ref=sr_pg_19?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=19&keywords=FLIR&ie=UTF8&qid=1537784570","https://www.amazon.co.uk/s/ref=sr_pg_20?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=20&keywords=FLIR&ie=UTF8&qid=1537784573"],"selectors":[{"id":"product","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.a-fixed-left-grid-inner","multiple":true,"delay":0},{"id":"product-name","type":"SelectorLink","parentSelectors":["product"],"selector":"div.a-row.a-spacing-small a.a-link-normal","multiple":false,"delay":0},{"id":"product-RRP","type":"SelectorText","parentSelectors":["product-name"],"selector":"span.a-text-strike","multiple":false,"regex":"","delay":0},{"id":"product-current-price-inc-vat","type":"SelectorText","parentSelectors":["product-name"],"selector":"td.a-span12 span.a-size-medium","multiple":false,"regex":"","delay":0},{"id":"product-prime-eligible","type":"SelectorText","parentSelectors":["product-name"],"selector":"i.a-icon.a-icon-prime","multiple":false,"regex":"","delay":0},{"id":"product-sold-by","type":"SelectorText","parentSelectors":["product-name"],"selector":"div#merchant-info.a-section.a-spacing-mini","multiple":false,"regex":"","delay":0},{"id":"more-sellers","type":"SelectorLink","parentSelectors":["product-name"],"selector":"span.olp-padding-right a","multiple":false,"delay":0},{"id":"more-sellers-page","type":"SelectorElement","parentSelectors":["more-sellers"],"selector":"div.a-section div.a-section div.a-row:nth-of-type(n+2), div.a-row.olpOffer","multiple":true,"delay":0},{"id":"seller-name","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-size-medium a, img.alt","multiple":false,"regex":"","delay":0},{"id":"item-condition","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"div.a-section span.a-size-medium","multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-size-large","multiple":false,"regex":"","delay":0},{"id":"item-delivery-price","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-color-secondary, span.supersaver","multiple":false,"regex":"","delay":0}]}

The reason I visit each page separately is because I have not found a good way to toggle the pages on Amazon. I always have issues with it.

So, does anyone have any feedback on the following:

  • Can you see anything wrong with the code itself?
  • Has the scheduler feature begun to play up for anyone else?
  • Does anyone know if Amazon are taking steps to block or screw up webscraper?

Any other comments would be most welcome.

Kind regards

Patrick

Hi, some of your selectors appear to be invalid, so you'll need to fix those. I've done a few, please modify as needed. For my test scrape, I used this setting:
Page load delay (ms): 5500

{"_id":"amazon-flir-co-uk","startUrl":["https://www.amazon.co.uk/s/ref=sr_pg_1?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&keywords=FLIR&ie=UTF8&qid=1537784570","https://www.amazon.co.uk/s/ref=sr_pg_2?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=2&keywords=FLIR&ie=UTF8&qid=1537784245"],"selectors":[{"id":"product","type":"SelectorElement","parentSelectors":["_root"],"selector":"div:nth-of-type(n+4) div.s-expand-height","multiple":true,"delay":0},{"id":"product-current-price-inc-vat","type":"SelectorText","parentSelectors":["product"],"selector":"span[aria-hidden]","multiple":false,"regex":"","delay":0},{"id":"product-prime-eligible","type":"SelectorHTML","parentSelectors":["product"],"selector":".a-row div.s-align-children-center","multiple":false,"regex":"Amazon Prime","delay":0},{"id":"more-sellers","type":"SelectorLink","parentSelectors":["product"],"selector":".a-color-secondary a","multiple":false,"delay":0},{"id":"more-sellers-page","type":"SelectorElement","parentSelectors":["more-sellers"],"selector":"div.a-section div.a-section div.a-row:nth-of-type(n+2), div.a-row.olpOffer","multiple":true,"delay":0},{"id":"seller-name","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-size-medium a, img.alt","multiple":false,"regex":"","delay":0},{"id":"item-condition","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"div.a-section span.a-size-medium","multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-size-large","multiple":false,"regex":"","delay":0},{"id":"item-delivery-price","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-color-secondary, span.supersaver","multiple":false,"regex":"","delay":0},{"id":"name","type":"SelectorText","parentSelectors":["product"],"selector":"span.a-size-base-plus","multiple":false,"regex":"","delay":0}]}