Attempting to scrape BJs Wholesale

Hi all! I am attempting to scrape all grocery products from BJ's wholesale and have run into two issues.

  1. I can't seem to get the pagination right due to only some categories having subcategories. Not sure if I have the pagination set up wrong.

  2. I am attempting to pull ALL images for a product, not just the main standard image and wanted to know how I can possibly get that done. Right now it is only pulling the main image.

Been scratching my head with this for the last 2 days, cant seem to get it right. I am new to scraping so any tips/suggestions are greatly appreciated.

Thank you all for your help!

Url: Online Grocery Shopping, Household & Pet Supplies - BJS Wholesale Club

Sitemap:
{"_id":"bjs_wholesaleV2","startUrl":["Online Grocery Shopping, Household & Pet Supplies - BJS Wholesale Club a","multiple":true,"delay":0},{"id":"SubCategories","type":"SelectorLink","parentSelectors":["Categories"],"selector":".shop-by-wrapper a","multiple":true,"delay":0},{"id":"Products","type":"SelectorLink","parentSelectors":["Categories","SubCategories","Next Page","Pagination"],"selector":"a.content-center","multiple":true,"delay":0},{"id":"Product Desc","type":"SelectorGroup","parentSelectors":["Products"],"selector":".desktopOnly h1, span.price-display","delay":0,"extractAttribute":""},{"id":"Next Page","type":"SelectorElementClick","parentSelectors":["SubCategories"],"selector":"a.page-num","multiple":false,"delay":0,"clickElementSelector":"li.next-btn","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueHTML"},{"id":"product_images","type":"SelectorImage","parentSelectors":["Products"],"selector":".mz-figure > img","multiple":true,"delay":0},{"id":"Pagination","type":"SelectorElementClick","parentSelectors":["Categories"],"selector":"a.page-num","multiple":false,"delay":2000,"clickElementSelector":"li.next-btn","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueHTML"}]}

Any help on this would be greatly appreciated! Thanks!

Hi @unloathe

Seems like the only possible way is to use page-range interval in the starting category URLs. For the images element attribute works just fine.

Btw i could not open your sitemap due to invalid JSON.
When pasting your sitemap use preformatted text.

Anyway my example looks something like this:

{"_id":"bjs-com","startUrl":["https://www.bjs.com/category/grocery-household-and-pet/paper-and-plastic/plates-cups-and-utensils/3000000000000117362?pagenumber=[1-3]"],"selectors":[{"id":"wrapper","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.product","multiple":true,"delay":0},{"id":"link","type":"SelectorLink","parentSelectors":["wrapper"],"selector":"a.product-link","multiple":false,"delay":0},{"id":"element-card","type":"SelectorElement","parentSelectors":["link"],"selector":"body:has(h1.product-title-name)","multiple":true,"delay":0},{"id":"image-1","type":"SelectorElementAttribute","parentSelectors":["element-card"],"selector":"div.mcs-items-container > div:nth(0) a","multiple":false,"extractAttribute":"href","delay":0},{"id":"image-2","type":"SelectorElementAttribute","parentSelectors":["element-card"],"selector":"div.mcs-items-container > div:nth(1) a","multiple":false,"extractAttribute":"href","delay":0},{"id":"image-3","type":"SelectorElementAttribute","parentSelectors":["element-card"],"selector":"div.mcs-items-container > div:nth(2) a","multiple":false,"extractAttribute":"href","delay":0}]}

Hope it helps.