Scraping variants size/colour

Hello WS.io community,

I'm trying to scrape this one:

Url: https://www.oxwork.com/

My sitemap is:
Sitemap:
{"_id":"oxwork","startUrl":["https://www.oxwork.com/"],"selectors":[{"id":"category1","type":"SelectorLink","parentSelectors":["_root"],"selector":".wide li.parent:nth-of-type(n+3) a.level-top","multiple":true,"delay":0},{"id":"category2","type":"SelectorLink","parentSelectors":["category1"],"selector":".categories-filter a","multiple":true,"delay":0},{"id":"category3","type":"SelectorLink","parentSelectors":["category2"],"selector":".categories-filter a","multiple":true,"delay":0},{"id":"pagescroll","type":"SelectorElementScroll","parentSelectors":["category3"],"selector":".products-grid li.item","multiple":true,"delay":"5000"},{"id":"p_modelelink","type":"SelectorLink","parentSelectors":["pagescroll"],"selector":".product-name a","multiple":false,"delay":0},{"id":"colorclik","type":"SelectorElementClick","parentSelectors":["p_modelelink"],"selector":"div[itemprop='mainContentOfPage']","multiple":true,"delay":"1000","clickElementSelector":"li:nth-of-type(n+1) span.x,","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"p_couleur","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"span.select-label","multiple":false,"regex":"","delay":0},{"id":"p_description","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"p_normprice","type":"SelectorText","parentSelectors":["sizeclick"],"selector":".price-discount div.special-price-info","multiple":false,"regex":"","delay":0},{"id":"p_wasprice","type":"SelectorText","parentSelectors":["sizeclick"],"selector":".price-discount div.normal-price-info","multiple":false,"regex":"","delay":0},{"id":"p_oxworkcoc","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"span.sku","multiple":false,"regex":"","delay":0},{"id":"p_brand","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"tr:contains('Marque') td","multiple":false,"regex":"","delay":0},{"id":"p_mpn","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"tr:contains('Référence fabriquant') td","multiple":false,"regex":"","delay":0},{"id":"p_available","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"span.value","multiple":false,"regex":"","delay":0},{"id":"p_size","type":"SelectorText","parentSelectors":["sizeclick"],"selector":"span#select_label_size","multiple":false,"regex":"","delay":0},{"id":"sizeclick","type":"SelectorElementClick","parentSelectors":["colorclik"],"selector":"parent","multiple":true,"delay":"1000","clickElementSelector":".swatch-link-149 span.x","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"p_price","type":"SelectorText","parentSelectors":["sizeclick"],"selector":".product-type-data div.special-price-info","multiple":false,"regex":"","delay":0}]}

if above code doesn't work it's propably, because underscores removed both side of parent here:
["colorclik"],"selector":"parent","multiple":true,"delay":"1000","clickElementSelector":".swatch-link-149 span.x",

I use category menu to browse the products families, then I use Scrolldown selector to display all the products on the page => works fine.
After that I use a selector link to reach the product page = > works fine.
Product page may display several colours/size. I need to go through each of them to collect description, prices, stock... It is where my problems occur.

I think I'm not far to have it right but I can't figure it out.

Would you help me please?

Thank you
David

Hi,
I've just fixed the issue about the loop shown on the first screenshot.
I think that I have to create specific sitemap for shoes and gloves.

David

Interesting site. This one will handle all 3 types of products you listed. Pls modify as needed:

{"_id":"oxwork-lite","startUrl":["https://www.oxwork.com/gants-portwest-flexo-grip-nylon-enduit-nitrile.html","https://www.oxwork.com/chaussures-de-securite-montantes-safety-jogger-safetyboy-s1p.html","https://www.oxwork.com/short-de-travail-lite-work-guard-result.html"],"selectors":[{"id":"title","type":"SelectorText","parentSelectors":["product-view selector"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"Click colors","type":"SelectorElementClick","parentSelectors":["product-view selector"],"selector":"div.inner","multiple":true,"delay":"1500","clickElementSelector":"span.swatch-label > img","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueCSSSelector"},{"id":"Color","type":"SelectorText","parentSelectors":["Click colors"],"selector":"span[id*=\"color\"]","multiple":false,"regex":"","delay":0},{"id":"Click sizes","type":"SelectorElementClick","parentSelectors":["Click colors"],"selector":"_parent_","multiple":true,"delay":"1500","clickElementSelector":"span.swatch-label","clickType":"clickOnce","discardInitialElements":"discard-when-click-element-exists","clickElementUniquenessType":"uniqueText"},{"id":"product-view selector","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.product-view","multiple":false,"delay":0},{"id":"special price","type":"SelectorText","parentSelectors":["Click sizes"],"selector":".special-price","multiple":false,"regex":"","delay":0},{"id":"avail","type":"SelectorText","parentSelectors":["Click sizes"],"selector":"span.value","multiple":false,"regex":"","delay":0},{"id":"size","type":"SelectorText","parentSelectors":["Click sizes"],"selector":"dt.swatch-attr label[id*=\"size\"]","multiple":false,"regex":"\\b.+$","delay":0}]}

1 Like

Hello Leemeng,
Sorry for the delay in responding, I was on holiday.
Thank you for this one, I'll have a look.
At the first glance, that's doing the job and that's definitely a good help for other similar websites.
thank you
David

Hello Leemeng,

The problem I've got with the above sitemap is that when there is no color shown on the product page, it doesn't gather the size details.
example: https://www.oxwork.com/bottes-d-hiver-helly-hansen-chelsea-winterboot-ht-ww-s3-src.html

I don't see how I can tackle this issue.

Any idea?

Thank you
David