Multi-page Scrape Data - Start URLs and Data in Random Order

batman · August 15, 2019, 5:24pm

Hello,
Two Questions:

When using the Start URL as https://www.lazada.co.th/shop-makeup-accessories/?page=[1-10]
Why does the scraping usually start at the highest page number and then go backwards? i.e. page 10,9,8,7,6,5...
Why does the output data from all multi-page website scrapes appear in random page order?
See example output below:

web-scraper-start-url

My actual goal is to scrape all products from pages 1-90, however I noticed the scraping stops after about 4-5 pages are scraped. Can anyone get this working for categories on www.lazada.co.th ?

Thanks!

leemeng · October 3, 2019, 2:54pm

Ya it is true the scraper will work "backwards" if you specify the page range. I don't why it was designed that way.

As for Lazada, I sometimes scrape them and have found that they've implemented bot detection ( Google reCAPTCHA v3). So you need to be stealthy or find workarounds.