Issues with provider locator... (will not pull first page results)

Data not being pulled from first page...

Hello, I am having issues with my crawl.

{"_id":"ifmv6","startUrl":["https://www.ifm.org/find-a-practitioner/?pg=1&country=US&city=Seattle&province&state_us=WA&state_ca&postal_code&rad=150&pos&advanced_search&ifm_certified&practitioner-first-name&practitioner-last-name&insurance&medicare&online&phone&primary-degree&languages"],"selectors":[{"id":"card","type":"SelectorElement","parentSelectors":["link"],"selector":"div.profileCard__content","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["card"],"selector":".profileCard__title a","multiple":false,"regex":"","delay":0},{"id":"address","type":"SelectorText","parentSelectors":["card"],"selector":"span:nth-of-type(n+3)","multiple":false,"regex":"","delay":0},{"id":"phone","type":"SelectorText","parentSelectors":["card"],"selector":"div.contactInfo__item:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.page-numbers","multiple":true,"delay":0}]}

Almost there. You only need to make your paginator a child of itself (recursive). You can hold down the Ctrl key to select multiple selectors.

2019-09-21_071424

This will allow it to find the first page. It will also allow your paginator to handle page ranges like: 1 2 3 .... 17 18 >

If your paginator is not recursive, it will only go to pages 1, 2, 3, 17 and 18.

I did a test scrape with the recursive paginator, and it got all the results from page 1.

This is great. Thank you!

One more issue here. Data scraping looks good in the refresh data ribbon, but exports in a messy format. Any ideas on how to fix this?

{"_id":"imf_den_detail_test1","startUrl":["https://www.ifm.org/find-a-practitioner/?country=US&city=denver&province=&state_us=CO&state_ca=&postal_code=&rad=150&pos=&advanced_search=&ifm_certified=&practitioner-first-name=&practitioner-last-name=&insurance=&medicare=&online=&phone=&primary-degree=&languages="],"selectors":[{"id":"link_detail","type":"SelectorLink","parentSelectors":["page_link"],"selector":".profileCard__avatar a","multiple":true,"delay":0},{"id":"page","type":"SelectorLink","parentSelectors":["page"],"selector":"a.page-numbers","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["link_detail"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"address 1","type":"SelectorText","parentSelectors":["link_detail"],"selector":".contactInfo__address a","multiple":false,"regex":"","delay":0},{"id":"phone","type":"SelectorText","parentSelectors":["link_detail"],"selector":"div.contactInfo__item:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorText","parentSelectors":["link_detail"],"selector":"div:nth-of-type(3) a","multiple":false,"regex":"","delay":0},{"id":"website","type":"SelectorImage","parentSelectors":["link_detail"],"selector":"img.profileHeader__image","multiple":false,"delay":0},{"id":"page_link","type":"SelectorLink","parentSelectors":["_root","page_link"],"selector":"a.page-numbers","multiple":true,"delay":0}]}

Also wondering if I could pull website URL from scrape...

Firstly, your "link_detail" selector also needs to be selected as a "child" selector to the root, as currently you are entering the 2nd page and above with your pagination selector, leaving the initial page with no "link_details" selector, resulting in the scraper not extracting any information from the first page.

Also, what exactly do you mean with "messy format"?

Yes, you can get the website using the "Element Attribute Selector":

{"_id":"imf_den_detail_test1","startUrl":["https://www.ifm.org/find-a-practitioner/?country=US&city=denver&province=&state_us=CO&state_ca=&postal_code=&rad=150&pos=&advanced_search=&ifm_certified=&practitioner-first-name=&practitioner-last-name=&insurance=&medicare=&online=&phone=&primary-degree=&languages="],"selectors":[{"id":"link_detail","type":"SelectorLink","parentSelectors":["_root","page_link"],"selector":".profileCard__avatar a","multiple":true,"delay":0},{"id":"page","type":"SelectorLink","parentSelectors":["page"],"selector":"a.page-numbers","multiple":true,"delay":0},{"id":"name","type":"SelectorText","parentSelectors":["link_detail"],"selector":"h1","multiple":false,"regex":"","delay":0},{"id":"address 1","type":"SelectorText","parentSelectors":["link_detail"],"selector":".contactInfo__address a","multiple":false,"regex":"","delay":0},{"id":"phone","type":"SelectorText","parentSelectors":["link_detail"],"selector":"div.contactInfo__item:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"email","type":"SelectorText","parentSelectors":["link_detail"],"selector":"div:nth-of-type(3) a","multiple":false,"regex":"","delay":0},{"id":"website","type":"SelectorImage","parentSelectors":["link_detail"],"selector":"img.profileHeader__image","multiple":false,"delay":0},{"id":"page_link","type":"SelectorLink","parentSelectors":["_root","page_link"],"selector":"a.page-numbers","multiple":true,"delay":0},{"id":"www","type":"SelectorElementAttribute","parentSelectors":["link_detail"],"selector":".contactInfo__item a:contains(\"Website\")","multiple":false,"extractAttribute":"href","delay":0}]}

This is how the data exports using your site map. Take a loo at the address column and email. It is corrupting columns somehow.