Hi
I'm having trouble scraping all pages of this site -
https://www.ggf.org.uk/members/
I can navigate into each service link from this page, and then from there click into each company record and scrape the data that I need. The problem comes with the pagination in the service pages, where the navigation is handled by a series of elements that dont have a class or id associated with them.
On the first page of results if I select the next page ('>') link it selects a:nth-of-type(4) as the selector. This is fine for the first page, but when the list loads page 2, a:nth-of-type(4) points to page 4 as the page loads another element to navigate backwards and additional elements for page numbers. Now to select the '>' element I would need to select a:nth-of-type(6). As a result the scraper only loads certain pages and doesnt pick up all records.
Is there a way around this, so it always picks up the '>' selector?
Sitemap:
{"_id":"cpa_ggf","startUrl":["http://www.ggf.org.uk/members/"],"selectors":[{"id":"ServiceLink","type":"SelectorLink","parentSelectors":["_root"],"selector":"a.categories","multiple":true,"delay":0},{"id":"CompanyLink","type":"SelectorLink","parentSelectors":["_root","pagination"],"selector":"h2.geodir-entry-title a","multiple":true,"delay":0},{"id":"Name","type":"SelectorText","parentSelectors":["CompanyLink"],"selector":"h2.entry-title","multiple":false,"regex":"","delay":0},{"id":"Address","type":"SelectorText","parentSelectors":["CompanyLink"],"selector":"div.featured-overlay","multiple":false,"regex":"","delay":0},{"id":"Web","type":"SelectorText","parentSelectors":["CompanyLink"],"selector":"div.featured-overlay a","multiple":false,"regex":"","delay":0},{"id":"Info","type":"SelectorText","parentSelectors":["CompanyLink"],"selector":"li p","multiple":false,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["ServiceLink"],"selector":"a:nth-of-type(4)","multiple":false,"delay":0}]}
Thanks