Trouble with Scroll and Tab Elements

Dismas · October 27, 2022, 12:55pm

Describe the problem.

Trying to set up a crawl for an E-Commerce site, some elements only seem to load properly when scrolled too. I'm having trouble getting the timings for the Scroll so that I can actually pull data from a Table hidden under a Tab. Any help with the Sitemap would be greatly appreciated.

{"_id":"wol_test1","startUrl":["https://www.wolseley.co.uk/product/safety-assured-hc-x-whi-self-assured-external-finger-protector-white-1980mm-%28each%29/","https://www.wolseley.co.uk/product/briton-door-closers-door-closer-2003e-silver-%28each%29/","https://www.wolseley.co.uk/product/ts71-se-en3-4-door-closer-%28each%29/","https://www.wolseley.co.uk/product/ts83-bcdc-se-en2-5-door-closer-%28each%29/","https://www.wolseley.co.uk/product/briton-121ce-door-closer-silver-%28each%29/"],"selectors":[{"id":"name","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"p.jss96","type":"SelectorText"},{"id":"wol_code","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"p.jss97","type":"SelectorText"},{"id":"price","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"p.jss98","type":"SelectorText"},{"clickElementSelector":".jss308 button","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":1500,"discardInitialElements":"do-not-discard","id":"table","multiple":false,"parentSelectors":["_root"],"selector":"div.jss156","type":"SelectorElementClick"},{"id":"brand","multiple":false,"parentSelectors":["table"],"regex":"","selector":"tr:contains('Brand') td.jss264","type":"SelectorText"},{"id":"sup_code","multiple":false,"parentSelectors":["table"],"regex":"","selector":"tr:contains('Supplier Product Code') td.jss264","type":"SelectorText"},{"delay":1000,"elementLimit":40,"id":"scroll","multiple":false,"parentSelectors":["_root"],"selector":"div.jss307","type":"SelectorElementScroll"}]}

ViestursWS · October 27, 2022, 1:11pm

@Dismas Hello, In order to fix that you should change the selector order to 'click > scroll > extract text data'.

Example:

{"_id":"wol_test1","startUrl":["https://www.wolseley.co.uk/product/safety-assured-hc-x-whi-self-assured-external-finger-protector-white-1980mm-%28each%29/","https://www.wolseley.co.uk/product/briton-door-closers-door-closer-2003e-silver-%28each%29/","https://www.wolseley.co.uk/product/ts71-se-en3-4-door-closer-%28each%29/","https://www.wolseley.co.uk/product/ts83-bcdc-se-en2-5-door-closer-%28each%29/","https://www.wolseley.co.uk/product/briton-121ce-door-closer-silver-%28each%29/"],"selectors":[{"id":"name","multiple":false,"parentSelectors":["scroll"],"regex":"","selector":"p[data-testid=\"product-name\"]","type":"SelectorText"},{"id":"wol_code","multiple":false,"parentSelectors":["scroll"],"regex":"","selector":"p[data-testid=\"product-code\"]","type":"SelectorText"},{"id":"price","multiple":false,"parentSelectors":["scroll"],"regex":"","selector":"p[data-testid=\"price\"]","type":"SelectorText"},{"clickElementSelector":"p[data-testid=\"title\"]:contains(\"Technical specifications\")","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":2000,"discardInitialElements":"discard-when-click-element-exists","id":"table","multiple":true,"parentSelectors":["_root"],"selector":"table#pdp-product-specs","type":"SelectorElementClick"},{"id":"brand","multiple":false,"parentSelectors":["scroll"],"regex":"","selector":"tr:contains('Brand') td + td","type":"SelectorText"},{"id":"sup_code","multiple":false,"parentSelectors":["scroll"],"regex":"","selector":"tr:contains('Supplier Product Code') td + td","type":"SelectorText"},{"delay":1000,"elementLimit":0,"id":"scroll","multiple":false,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementScroll"}]}

Dismas · October 27, 2022, 2:59pm

Works great! getting all the data i was after now, thanks

only tiny issue with it is that it only seems to run if the crawler window is "ontop" is there any way around this?

ViestursWS · October 31, 2022, 2:40pm

@Dismas Hi, yes, the scraper window has to be opened, otherwise, the click/scroll selector will not keep executing. Would suggest using another monitor or launching the scrape in Web Scraper Cloud in case that might interfere with your daily workflow.