"pages" rotate properly, but empty scrape

the 8000+ "pages" rotate properly, but empty scrape when trying to grab the data like these. n00b needs halp, been banging my head for hours lol

HK Tilt Group Ltd.
Incorporation #:
1405865
Licence #:
52688
Licence Type:
Developer and General Contractor
Status:
In Good Standing : Approved
Expiry Date:
2025/May/31
Closed Date:
Person responsible for the company:
Harjit Sandhu and Kiranpreet Nagra
Contact Information:
106 - 15585 24 Avenue

Url: https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Any&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*

Sitemap:
attempt 1

{"_id":"bch2","startUrl":["https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Any&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing%2A"],"selectors":[{"id":"ltd","parentSelectors":["_root"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"a.list-group-item","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"a.list-group-item"},{"id":"box","parentSelectors":["_root"],"type":"SelectorElement","selector":"div.card.m-1","multiple":true},{"id":"name","parentSelectors":["box"],"type":"SelectorText","selector":"strong","multiple":false,"regex":""},{"id":"Incorporation","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(2) div","multiple":false,"regex":""},{"id":"Licence ","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(3) div","multiple":false,"regex":""},{"id":"Licence Type:","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(4) div","multiple":false,"regex":""},{"id":"Status","parentSelectors":["box"],"type":"SelectorText","selector":"div[data-toggle]","multiple":false,"regex":""},{"id":"Expiry Date:","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(6) div","multiple":false,"regex":""},{"id":"Closed Date:","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(7) div","multiple":false,"regex":""},{"id":"Person responsible for the company:","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(8) div","multiple":false,"regex":""},{"id":"Contact Information:","parentSelectors":["box"],"type":"SelectorText","selector":"div:nth-of-type(9) div","multiple":false,"regex":""}]}

attempt 2
{"_id":"bch","startUrl":["https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Developer&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*"],"selectors":[{"id":"ltd","parentSelectors":["_root"],"type":"SelectorElementClick","clickActionType":"real","clickElementSelector":"a.list-group-item","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"a.list-group-item"},{"id":"name","parentSelectors":["_root"],"type":"SelectorText","selector":".offset-md-3 strong","multiple":false,"regex":""},{"id":"Incorporation ","parentSelectors":["_root"],"type":"SelectorText","selector":"div.form-group:nth-of-type(3) div.col-md-9","multiple":false,"regex":""},{"id":"Licence ","parentSelectors":["_root"],"type":"SelectorText","selector":"div:nth-of-type(4) div.col-md-9","multiple":false,"regex":""},{"id":"Licence Type:","parentSelectors":["_root"],"type":"SelectorText","selector":"div:nth-of-type(5) div.col-md-9","multiple":false,"regex":""},{"id":"Status","parentSelectors":["_root"],"type":"SelectorText","selector":"div[data-toggle]","multiple":false,"regex":""},{"id":"Expiry Date:","parentSelectors":["_root"],"type":"SelectorText","selector":"div:nth-of-type(7) div.col-md-9","multiple":false,"regex":""},{"id":"Closed Date:","parentSelectors":["_root"],"type":"SelectorText","selector":"div:nth-of-type(8) div.col-md-9","multiple":false,"regex":""},{"id":"Person responsible for the company:","parentSelectors":["_root"],"type":"SelectorText","selector":"div:nth-of-type(9) div.col-md-9","multiple":false,"regex":""},{"id":"Contact Information:","parentSelectors":["_root"],"type":"SelectorText","selector":"div:nth-of-type(10) div","multiple":false,"regex":""},{"id":"phone 1","parentSelectors":["_root"],"type":"SelectorText","selector":"li:nth-of-type(1)","multiple":false,"regex":""},{"id":"phone 2","parentSelectors":["_root"],"type":"SelectorText","selector":"li:nth-of-type(2)","multiple":false,"regex":""},{"id":"url","parentSelectors":["_root"],"type":"SelectorText","selector":".col-md-9 a","multiple":false,"regex":""}],"websiteStateSetup":{"enabled":true,"performWhenNotFoundSelector":"a.list-group-item","actions":[{"type":"openUrl","url":"https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Developer&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*"}]}}

@JanAp is this something you can help with please? thanks

Hi,

The 'Selector 'value in the click selector represents the element where the target data will be located and the 'Click selector' value represents the element that should be clicked. Also, the data selectors should be set as children to the click selector since there are multiple elements.

click3

{"_id":"bch","startUrl":["https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Developer&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*"],"selectors":[{"clickActionType":"real","clickElementSelector":"a.list-group-item:nth-of-type(-n+5)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":2000,"discardInitialElements":"discard-when-click-element-exists","id":"ltd","multiple":true,"parentSelectors":["_root"],"selector":"div.card.m-1","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":".offset-md-3 strong","type":"SelectorText"},{"id":"Incorporation ","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div.form-group:nth-of-type(3) div.col-md-9","type":"SelectorText"},{"id":"Licence ","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(4) div.col-md-9","type":"SelectorText"},{"id":"Licence Type:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(5) div.col-md-9","type":"SelectorText"},{"id":"Status","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div[data-toggle]","type":"SelectorText"},{"id":"Expiry Date:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(7) div.col-md-9","type":"SelectorText"},{"id":"Closed Date:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(8) div.col-md-9","type":"SelectorText"},{"id":"Person responsible for the company:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(9) div.col-md-9","type":"SelectorText"},{"id":"Contact Information:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(10) div","type":"SelectorText"},{"id":"phone 1","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"li:nth-of-type(1)","type":"SelectorText"},{"id":"phone 2","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"li:nth-of-type(2)","type":"SelectorText"},{"id":"url","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":".col-md-9 a","type":"SelectorText"}],"websiteStateSetup":{"enabled":true,"performWhenNotFoundSelector":"a.list-group-item","actions":[{"type":"openUrl","url":"https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Developer&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*"}]}}
1 Like

awesome, i was able to change the number in "clickElementSelector":"a.list-group-item:nth-of-type(-n+5)" like the +5 to 50 it'll grab like 50 lines.... is the app capable of taking the time to grab everything in the thousands? i can't get it anywhere near the 8000+ lines that it populates

Oh, yes, that was just added for testing purposes. You can just leave a.list-group-item to click through all elements.

Please note that results will only be returned once the scraper has clicked on all relevant elements.

1 Like

able to scrape about 1000 lines... it just fail & closes everything for 3000 - 8000 lines split job... tried brave/chrome/and firefox... got 24GB ram... browser seem to only use 1GB. is there a way to select like select line 1000 to 2000?

tried the hosted trial but got worse result: 0

Sitemap ID: 1274946
Job ID: 28742816

Hi,

I tested the sitemap, and the scraping process is slowing down significantly after executing several thousand clicks. I would recommend scraping the listings is several batches, see the reference below on how to match the elements in ranges:

{"_id":"bch","startUrl":["https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Developer&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*"],"selectors":[{"clickActionType":"real","clickElementSelector":"a.list-group-item:nth-of-type(n+1000):nth-of-type(-n+2000)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":0,"discardInitialElements":"discard-when-click-element-exists","id":"ltd","multiple":true,"parentSelectors":["_root"],"selector":"div.card.m-1","type":"SelectorElementClick"},{"id":"name","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":".offset-md-3 strong","type":"SelectorText"},{"id":"Incorporation ","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div.form-group:nth-of-type(3) div.col-md-9","type":"SelectorText"},{"id":"Licence ","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(4) div.col-md-9","type":"SelectorText"},{"id":"Licence Type:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(5) div.col-md-9","type":"SelectorText"},{"id":"Status","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div[data-toggle]","type":"SelectorText"},{"id":"Expiry Date:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(7) div.col-md-9","type":"SelectorText"},{"id":"Closed Date:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(8) div.col-md-9","type":"SelectorText"},{"id":"Person responsible for the company:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(9) div.col-md-9","type":"SelectorText"},{"id":"Contact Information:","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"div:nth-of-type(10) div","type":"SelectorText"},{"id":"phone 1","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"li:nth-of-type(1)","type":"SelectorText"},{"id":"phone 2","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":"li:nth-of-type(2)","type":"SelectorText"},{"id":"url","multiple":false,"parentSelectors":["ltd"],"regex":"","selector":".col-md-9 a","type":"SelectorText"}],"websiteStateSetup":{"enabled":true,"performWhenNotFoundSelector":"a.list-group-item","actions":[{"type":"openUrl","url":"https://newhomesregistry.bchousing.org/LicenceRegistry/LicenceSearch?LicType=Developer&Loc=By%20City&Area=Any&LicStat=In%20Good%20Standing*"}]}}
1 Like