I have been running scrapes using the Cloud version of webscraper very successfully for months now. I have scraped all FLIR branded product, daily, from www.amazon.co.uk, www.amazon.de, www.amazon.es, www.amazon.fr and www.amazon.it, since 6th September 2018.
Scraping these sites results in around 2000 product scrapes per day. I schedule the jobs so they run around 15 minutes apart starting at 02:00 in the morning, each day.
Almost exactly 2 weeks ago, the schedule started to fail badly across all 5 Amazon sites. In some cases it fails completely. In other cases it scraped only a small fraction of the FLIR product I KNOW is sold on the Amazon sites, and which I am used to successfully scrape.
Successful scraps from each site for the last 3 weeks have been:
Days .co.uk .de .es .fr .it
-1 517 467 452 340 428
-2 360 0 1 5 3
-3 27 0 2 128 15
-4 556 505 464 350 438
-5 281 2 98 349 34
-6 545 79 16 1 191
-7 18 1 1 0 249
-8 41 36 0 2 8
-9 453 121 44 9 1
-10 176 2 42 1 28
-11 528 461 408 8 3
-12 142 0 174 102 80
-13 535 510 465 162 281
-14 314 2 14 1 429
-15 521 473 464 3 24
-16 527 478 434 167 419
-17 533 487 457 144 26
-18 517 483 437 167 434
-19 550 486 447 173 297
-20 555 487 459 168 428
-21 535 506 452 167 433
As you can see www.amazon.fr was always a bit of an issue, despite there being fundamentally a similar number of pages of FLIR product, I was never able to tweak the scraper to collect more than around 170, although oddly, that has changed recently, as you can see.
When I saw the yesterday's scrape (Day -1) had failed in a major way, I deleted the files and re-ran them manually. As you can see, having done this manual rerun yesterday, I got normal looking figures (the figures presented on row -1). I also reran manually last Friday (-4 days ago), which is why that looks normal too.
But between days -14 and -2, it has been very, very haphazard.
Anyone got any clue why? The code used across each site is broadly the same. I replicate the www.amazon.co.uk code below for reference:
{"_id":"amazon-flir-co-uk","startUrl":["https://www.amazon.co.uk/s/ref=sr_pg_1?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&keywords=FLIR&ie=UTF8&qid=1537784570","https://www.amazon.co.uk/s/ref=sr_pg_2?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=2&keywords=FLIR&ie=UTF8&qid=1537784245","https://www.amazon.co.uk/s/ref=sr_pg_3?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=3&keywords=FLIR&ie=UTF8&qid=1537784326","https://www.amazon.co.uk/s/ref=sr_pg_4?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=4&keywords=FLIR&ie=UTF8&qid=1537784335","https://www.amazon.co.uk/s/ref=sr_pg_5?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=5&keywords=FLIR&ie=UTF8&qid=1537784342","https://www.amazon.co.uk/s/ref=sr_pg_6?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=6&keywords=FLIR&ie=UTF8&qid=1537784504","https://www.amazon.co.uk/s/ref=sr_pg_7?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=7&keywords=FLIR&ie=UTF8&qid=1537784519","https://www.amazon.co.uk/s/ref=sr_pg_8?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=8&keywords=FLIR&ie=UTF8&qid=1537784524","https://www.amazon.co.uk/s/ref=sr_pg_9?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=9&keywords=FLIR&ie=UTF8&qid=1537784527","https://www.amazon.co.uk/s/ref=sr_pg_10?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=10&keywords=FLIR&ie=UTF8&qid=1537784532","https://www.amazon.co.uk/s/ref=sr_pg_11?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=11&keywords=FLIR&ie=UTF8&qid=1537784535","https://www.amazon.co.uk/s/ref=sr_pg_12?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=12&keywords=FLIR&ie=UTF8&qid=1537784539","https://www.amazon.co.uk/s/ref=sr_pg_13?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=13&keywords=FLIR&ie=UTF8&qid=1537784545","https://www.amazon.co.uk/s/ref=sr_pg_14?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=14&keywords=FLIR&ie=UTF8&qid=1537784549","https://www.amazon.co.uk/s/ref=sr_pg_15?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=15&keywords=FLIR&ie=UTF8&qid=1537784553","https://www.amazon.co.uk/s/ref=sr_pg_16?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=16&keywords=FLIR&ie=UTF8&qid=1537784558","https://www.amazon.co.uk/s/ref=sr_pg_17?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=17&keywords=FLIR&ie=UTF8&qid=1537784562","https://www.amazon.co.uk/s/ref=sr_pg_18?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=18&keywords=FLIR&ie=UTF8&qid=1537784577","https://www.amazon.co.uk/s/ref=sr_pg_19?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=19&keywords=FLIR&ie=UTF8&qid=1537784570","https://www.amazon.co.uk/s/ref=sr_pg_20?fst=as%3Aoff&rh=i%3Aaps%2Ck%3AFLIR%2Cp_89%3AFLIR&page=20&keywords=FLIR&ie=UTF8&qid=1537784573"],"selectors":[{"id":"product","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.a-fixed-left-grid-inner","multiple":true,"delay":0},{"id":"product-name","type":"SelectorLink","parentSelectors":["product"],"selector":"div.a-row.a-spacing-small a.a-link-normal","multiple":false,"delay":0},{"id":"product-RRP","type":"SelectorText","parentSelectors":["product-name"],"selector":"span.a-text-strike","multiple":false,"regex":"","delay":0},{"id":"product-current-price-inc-vat","type":"SelectorText","parentSelectors":["product-name"],"selector":"td.a-span12 span.a-size-medium","multiple":false,"regex":"","delay":0},{"id":"product-prime-eligible","type":"SelectorText","parentSelectors":["product-name"],"selector":"i.a-icon.a-icon-prime","multiple":false,"regex":"","delay":0},{"id":"product-sold-by","type":"SelectorText","parentSelectors":["product-name"],"selector":"div#merchant-info.a-section.a-spacing-mini","multiple":false,"regex":"","delay":0},{"id":"more-sellers","type":"SelectorLink","parentSelectors":["product-name"],"selector":"span.olp-padding-right a","multiple":false,"delay":0},{"id":"more-sellers-page","type":"SelectorElement","parentSelectors":["more-sellers"],"selector":"div.a-section div.a-section div.a-row:nth-of-type(n+2), div.a-row.olpOffer","multiple":true,"delay":0},{"id":"seller-name","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-size-medium a, img.alt","multiple":false,"regex":"","delay":0},{"id":"item-condition","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"div.a-section span.a-size-medium","multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-size-large","multiple":false,"regex":"","delay":0},{"id":"item-delivery-price","type":"SelectorText","parentSelectors":["more-sellers-page"],"selector":"span.a-color-secondary, span.supersaver","multiple":false,"regex":"","delay":0}]}
The reason I visit each page separately is because I have not found a good way to toggle the pages on Amazon. I always have issues with it.
So, does anyone have any feedback on the following:
- Can you see anything wrong with the code itself?
- Has the scheduler feature begun to play up for anyone else?
- Does anyone know if Amazon are taking steps to block or screw up webscraper?
Any other comments would be most welcome.
Kind regards
Patrick