Help scraping site

Hello,

I have scraped quite a few websites but I am having trouble with this one -https://www.walmart.com/browse/health/probiotics/976760_1396434_8617528?page=1

I have set the scraped to go into each product and collect info, and to do this for all 25 pages. I have done similar things on other websites and it has worked fine.

On this site it begins, but then keeps loading the first page, rather than loading the first page then going into the products.

Any help on making this work to scrape the full 1700 products would be greatly appreciated. I will paste the sitemap below.

Thank you!

{"_id":"walmart-lumina","startUrl":["https://www.walmart.com/browse/health/probiotics/976760_1396434_8617528?page=1"],"selectors":[{"id":"Company link","type":"SelectorLink","parentSelectors":["_root","page"],"selector":"a.product-title-link","multiple":true,"delay":0},{"id":"Product title","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.hide-content-max-m h1.prod-ProductTitle div","multiple":false,"regex":"","delay":0},{"id":"review count","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info span.stars-reviews-count-node","multiple":false,"regex":"","delay":0},{"id":"brand","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info a.prod-brandName span","multiple":false,"regex":"","delay":0},{"id":"Wallmart ID","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info div.valign-middle.copy-mini.display-inline-block","multiple":false,"regex":"","delay":0},{"id":"Rewiew score","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info div.stars","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["Company link"],"selector":"span.hide-content span.price","multiple":false,"regex":"","delay":0},{"id":"unit price / offer","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.prod-ProductOffer-ppu","multiple":false,"regex":"","delay":0},{"id":"page","type":"SelectorLink","parentSelectors":["_root","page"],"selector":"ul.paginator-list a","multiple":true,"delay":0}]}

Try changing your next-page selector to .paginator-btn-next and remove the multiple check box.

Thanks for the help, Bret. I've tried that but it only scrapes the first page. Any other thoughts?

Thanks again for the help.

Use dynamic URL

{"_id":"walmart-lumina","startUrl":["https://www.walmart.com/browse/health/probiotics/976760_1396434_8617528?page=[1-25]"],"selectors":[{"id":"Company link","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.product-title-link","multiple":true,"delay":0},{"id":"Product title","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.hide-content-max-m h1.prod-ProductTitle div","multiple":false,"regex":"","delay":0},{"id":"review count","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info span.stars-reviews-count-node","multiple":false,"regex":"","delay":0},{"id":"brand","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info a.prod-brandName span","multiple":false,"regex":"","delay":0},{"id":"Wallmart ID","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.product-secondary-info div.valign-middle.copy-mini.display-inline-block","multiple":false,"regex":"","delay":0},{"id":"Rewiew score","type":"SelectorElementAttribute","parentSelectors":["Company link"],"selector":"span.stars-container","multiple":false,"extractAttribute":"alt","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["Company link"],"selector":"span.hide-content span.price","multiple":false,"regex":"","delay":0},{"id":"unit price / offer","type":"SelectorText","parentSelectors":["Company link"],"selector":"div.prod-ProductOffer-ppu","multiple":false,"regex":"","delay":0}]}

Hello, thank you so much. This works!!

If you don't mind I have another question - hidden in the code of each product page there is a GTIN number. For example on this page - https://www.walmart.com/ip/Nature-s-Bounty-Acidophilus-Probiotic-Dietary-Supplement-Tablets-200-Ct/48006485 - in the source you can see this piece of code:

I would like to scrape the value in the content tab (0074312307096). But as this HTML doesn't seem to be behind anything I can't make it work. Do you have any ideas?

Hopefully I have described this well.

Thank you again for your help!

Add
Selector = Element Attribute
Selector= div.product-secondary-info.hide-content-max-m.hf-BotRow > div > span
Attribute - Content