I am using the library web-scraper-headless to scrape multiple products from a website that contains pagination.
I followed the tutorial for pagination using several pagination childs of each other, below is my current sitemap and the starting url of my sitemap.
The problem is that the output data doesn't contain all products as I would expect, for example on page 39 I only got 3 different products and in that page there are 10 products, I checked the data preview using chrome extension and all the 10 products appear on data preview on page 39. Page 39 is just an example, I suppose there will be more pages where not all products were scrapped.
You could say that I am using a very short delay and the embedded jsdom wouldn't have time to process everything, but I am using 10 seconds of delay and page delay as the following settings show:
const scraperOpts = {
delay: 10000,
pageLoadDelay: 10000
};
I was expecting a total of 428 products and from the scrapping I am only getting 161 unique products, what is wrong here? Can someone please give me some guidance?
Url: https://www.cartridgesave.co.uk/printers.html?p=1
Sitemap:
{"_id":"printers","startUrl":["https://www.cartridgesave.co.uk/printers.html?p=1"],"selectors":[{"id":"pagination","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":".search div:nth-of-type(2) .pages-items a","multiple":true,"delay":0},{"id":"product-link","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":".product-item-inner a.product-item-link","multiple":true,"delay":0},{"id":"ManufacturerPartNo","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Manufacturer Part No.:') td","multiple":false,"regex":"","delay":0},{"id":"Brand","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Brand:') td","multiple":false,"regex":"","delay":0},{"id":"ProductType","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Product Type:') td","multiple":false,"regex":"","delay":0},{"id":"Connectivity","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Connectivity:') td","multiple":false,"regex":"","delay":0},{"id":"Height","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Height:') td","multiple":false,"regex":"","delay":0},{"id":"Width","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Width:') td","multiple":false,"regex":"","delay":0},{"id":"Depth","type":"SelectorText","parentSelectors":["product-link"],"selector":"#information tr:contains('Depth:') td","multiple":false,"regex":"","delay":0},{"id":"CartridgesLink","type":"SelectorLink","parentSelectors":["product-link"],"selector":"a.catridge_printer_link","multiple":false,"delay":0},{"id":"Catridges","type":"SelectorLink","parentSelectors":["CartridgesLink"],"selector":".product-item-inner a.product-item-link","multiple":true,"delay":0},{"id":"CatridgesModel","type":"SelectorText","parentSelectors":["Catridges"],"selector":"#information tr:contains('Manufacturer Part No.:') td","multiple":false,"regex":"","delay":0}]}
Thanks a lot!