2 Clicks before scrape

I am trying to scrape this site Find a Contractor - MCS
I need to click on "VIEW ALL" and then "LIST VIEW" before scraping
It looks like I need to wait ~5 seconds after the first click

Not sure how I do this, any pointers would be appreciated

This is what my sitemap currently looks like
{"_id":"msc","startUrl":["https://mcscertified.com/find-an-installer/"],"selectors":[{"id":"element","parentSelectors":["startclick"],"type":"SelectorElementClick","clickElementSelector":"h3","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"tr:nth-of-type(n+2) div.msw-list-view-item"},{"id":"company","parentSelectors":["element"],"type":"SelectorText","selector":"h3","multiple":false,"regex":""},{"id":"address","parentSelectors":["element"],"type":"SelectorText","selector":"div:nth-of-type(4) div:nth-of-type(2)","multiple":false,"regex":""},{"id":"phone","parentSelectors":["element"],"type":"SelectorText","selector":"div:nth-of-type(5) a","multiple":false,"regex":""},{"id":"website","parentSelectors":["element"],"type":"SelectorLink","selector":"a[target]","multiple":false},{"id":"regions","parentSelectors":["element"],"type":"SelectorText","selector":"div:nth-of-type(7) div:nth-of-type(2)","multiple":false,"regex":""},{"id":"email","parentSelectors":["element"],"type":"SelectorLink","selector":"a.msw-installer-contact-button","multiple":false},{"id":"ashp","parentSelectors":["element"],"type":"SelectorText","selector":"div.msw-installer-technology:nth-of-type(1) span:nth-of-type(1)","multiple":false,"regex":""},{"id":"startclick","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"div.msw-launchpad-tab.active","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":false,"selector":"div.msw-launchpad-tab.active"},{"id":"listview","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"#msw-list-view span:nth-of-type(1)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":10000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div#msw-list-view"}]}

@supahoopsa Hi, you should remove some of the listing clicks as the targeted data is natively embedded into the HTML without needing a click to reveal it.

Here's an example:

{"_id":"msc-test-1","startUrl":["https://mcscertified.com/find-an-installer/"],"selectors":[{"id":"company","multiple":false,"parentSelectors":["element"],"regex":"","selector":"h3","type":"SelectorText"},{"id":"address","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div:nth-of-type(4) div:nth-of-type(2)","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div:nth-of-type(5) a","type":"SelectorText"},{"id":"website","multiple":false,"parentSelectors":["element"],"selector":"a[target]","type":"SelectorLink"},{"id":"regions","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div:nth-of-type(7) div:nth-of-type(2)","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["element"],"selector":"a.msw-installer-contact-button","type":"SelectorLink"},{"id":"ashp","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div.msw-installer-technology:nth-of-type(1) span:nth-of-type(1)","type":"SelectorText"},{"clickElementSelector":"div#msw-toggle-filters:contains(\"View All\")","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":4000,"discardInitialElements":"do-not-discard","id":"startclick","multiple":false,"parentSelectors":["_root"],"selector":"div.msw-launchpad-tab.active","type":"SelectorElementClick"},{"delay":10000,"elementLimit":500,"id":"scroller","multiple":true,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementScroll"},{"clickElementSelector":"#msw-list-view span:nth-of-type(1)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":5000,"discardInitialElements":"do-not-discard","id":"listview","multiple":true,"parentSelectors":["_root"],"selector":"div#msw-list-view","type":"SelectorElementClick"},{"clickElementSelector":"a.current + a:not(:contains(\"5\"))","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"element","multiple":true,"parentSelectors":["_root"],"selector":"tr div.msw-list-view-item","type":"SelectorElementClick"}]}

@viesturs

Thats great. Thankyou very much

Two further questions...

Firstly...
Your sitemap scraped 40 records
Im seeing 10 records per page and 261 pages
How did you scrape four pages and how do I change that to scrape all of the data?

Secondly...
When you view a full listing there is a little "icon" tick or cross next to the services they provide. Is there a way of scraping that?

TIA
David

@supahoopsa Hi, understood. The sitemap had a selector pagination selector limited due to testing purposes.

Please, note that when using the Pagination(with the click type), Element Click, or Element Scroll selectors, the extracted data only becomes available after the respective selector has finished (no new elements are being matched).

Here's the sitemap without the pagination limit applied:

{"_id":"msc-test-1","startUrl":["https://mcscertified.com/find-an-installer/"],"selectors":[{"id":"company","multiple":false,"parentSelectors":["element"],"regex":"","selector":"h3","type":"SelectorText"},{"id":"address","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div:nth-of-type(4) div:nth-of-type(2)","type":"SelectorText"},{"id":"phone","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div:nth-of-type(5) a","type":"SelectorText"},{"id":"website","multiple":false,"parentSelectors":["element"],"selector":"a[target]","type":"SelectorLink"},{"id":"regions","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div:nth-of-type(7) div:nth-of-type(2)","type":"SelectorText"},{"id":"email","multiple":false,"parentSelectors":["element"],"selector":"a.msw-installer-contact-button","type":"SelectorLink"},{"id":"ashp","multiple":false,"parentSelectors":["element"],"regex":"","selector":"div.msw-installer-technology:nth-of-type(1) span:nth-of-type(1)","type":"SelectorText"},{"clickElementSelector":"div#msw-toggle-filters:contains(\"View All\")","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":4000,"discardInitialElements":"do-not-discard","id":"startclick","multiple":false,"parentSelectors":["_root"],"selector":"div.msw-launchpad-tab.active","type":"SelectorElementClick"},{"delay":10000,"elementLimit":500,"id":"scroller","multiple":true,"parentSelectors":["_root"],"selector":"body","type":"SelectorElementScroll"},{"clickElementSelector":"#msw-list-view span:nth-of-type(1)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":5000,"discardInitialElements":"do-not-discard","id":"listview","multiple":true,"parentSelectors":["_root"],"selector":"div#msw-list-view","type":"SelectorElementClick"},{"clickElementSelector":"a.current + a","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","id":"element","multiple":true,"parentSelectors":["_root"],"selector":"tr div.msw-list-view-item","type":"SelectorElementClick"}]}

@supahoopsa What icon are you referring to?

The little ticks and crosses shown here
image

@supahoopsa Are you looking to select the 'ticked' category? If so - for that purpose you can use the following selector: span[style="background-image: url('https://mcs-website-widget.solsticecloud.com/Images/Tick.svg')"]:visible

jQuery selector ':visible' - Selects all elements that are visible.

Learn more: Selectors | jQuery API Documentation

1 Like

@viesturs

I really appreciate your help with this - I'm learning a lot

A couple of issues...
1.The sitemap without the pagination limit applied you provided above is only doing the first 5 pages, what do I need to change to make it scroll through all 265 pages ?
2. The ticked category sector - I cannot see how to add the jQuery selector you are talking about - any pointers

Thank you very much for continued support
David

@supahoopsa Hi, to solve the pagination issue try to change the 'Click element uniqueness' for the 'Element' selector to - 'Unique Text'.

To locate the 'ticked' element in the page HTML, right-click on any 'ticked' element and press 'Inspect'.

span[style="background-image: url('https://mcs-website-widget.solsticecloud.com/Images/Tick.svg')"]

P.S. Don't apply the 'visible' selector as by default all of the panels are closed and these elements are visually hidden. Use the 'Grouped' selector - span[style="background-image: url('https://mcs-website-widget.solsticecloud.com/Images/Tick.svg')"]:not(:hidden) instead

Thanks :grinning:

I have managed to scrape all 265 pages

Still having problems with the ticks

I entered the code you suggested and when I did a Data Preview it all looked good however when i extracted the data the column for that field just had [] in it

I was considering whether it might be better to create a different selector for each of the 11 different services but cannot figure out how to see what background image is displayed for each element in that section

Here is the sitemap I am using

{"_id":"mcscertifiedv6","startUrl":["https://mcscertified.com/find-an-installer/"],"selectors":[{"id":"company","parentSelectors":["element"],"type":"SelectorText","selector":"h3","multiple":false,"regex":""},{"id":"address","parentSelectors":["element"],"type":"SelectorText","selector":"div:nth-of-type(4) div:nth-of-type(2)","multiple":false,"regex":""},{"id":"phone","parentSelectors":["element"],"type":"SelectorText","selector":"div:nth-of-type(5) a","multiple":false,"regex":""},{"id":"website","parentSelectors":["element"],"type":"SelectorLink","selector":"a[target]","multiple":false},{"id":"regions","parentSelectors":["element"],"type":"SelectorText","selector":"div:nth-of-type(7) div:nth-of-type(2)","multiple":false,"regex":""},{"id":"email","parentSelectors":["element"],"type":"SelectorLink","selector":"a.msw-installer-contact-button","multiple":false},{"id":"startclick","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"div#msw-toggle-filters:contains(\"View All\")","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":4000,"discardInitialElements":"do-not-discard","multiple":false,"selector":"div.msw-launchpad-tab.active"},{"id":"scroller","parentSelectors":["_root"],"type":"SelectorElementScroll","selector":"body","multiple":true,"delay":10000,"elementLimit":500},{"id":"listview","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"#msw-list-view span:nth-of-type(1)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":5000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"div#msw-list-view"},{"id":"element","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"a.current + a:not(:contains(\"265\"))","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"tr div.msw-list-view-item"},{"id":"ASHP","parentSelectors":["element"],"type":"SelectorElementAttribute","selector":"div.msw-tech-cannot-install:nth-of-type(1)","multiple":false,"extractAttribute":"class"},{"id":"Battery Storage","parentSelectors":["element"],"type":"SelectorElementAttribute","selector":"div.msw-installer-technology:nth-of-type(2) span:nth-of-type(1)","multiple":false,"extractAttribute":"style"},{"id":"Biomass","parentSelectors":["element"],"type":"SelectorElementAttribute","selector":"div.msw-installer-technology:nth-of-type(3) span:nth-of-type(1)","multiple":false,"extractAttribute":"style"}]}

Have revisited this...


This shows code for the element I'm trying to scrape
I (think) I need to scrape whether "msw-tech-cannot-install" is present or not
Have tried all sorts of methods but still struggling :cry: