Title: PagesJaunes pagination not working in Web Scraper (SelectorPagination stays on page 1)
Hello,
I’m using the Web Scraper Chrome extension to scrape PagesJaunes results, but pagination does not work: it never moves past page 1.
I suspect pagination is JS/AJAX-driven (history/pushState), but I’m not sure how to configure Web Scraper correctly.
Here is my sitemap JSON (current config). The pagination selector is probably wrong, but I don’t know what the correct approach is:
{"_id":"Test_1","startUrl":["https://www.pagesjaunes.fr/annuaire/chercherlespros?quoiqui=medecin%20generaliste&ou=Seine-Maritime%20%2876%29&idOu=D076&page=1&contexte=iFVVkzFYh7tmqwjuRJnDypl3chPiVW9bvCRwiuY5lwodxQ0PhOSkwLrR%2BYjTotoH8KaY1Uw5WU5bxMn%2BZduDXw%3D%3D&quoiQuiInterprete=medecin%20generaliste"],"selectors":[{"id":"Pagination","paginationType":"clickMore","parentSelectors":["_root","Pagination"],"selector":".link_pagination span.value","type":"SelectorPagination"},{"elementLimit":0,"id":"record_wrapper","multiple":true,"parentSelectors":["_root"],"scroll":false,"selector":"li.bi","type":"SelectorElement"},{"id":"data","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"h3","type":"SelectorText","version":2},{"id":"data2","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":".bi-address a","type":"SelectorText","version":2},{"id":"rating","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.bi-rating","type":"SelectorText","version":2},{"id":"data3","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.bi-hours","type":"SelectorText","version":2},{"id":"data4","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.nb-positifs","type":"SelectorText","version":2},{"id":"data5","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":".btn_ico_left span.value","type":"SelectorText","version":2},{"id":"data6","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.label-adresse","type":"SelectorText","version":2},{"id":"data7","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":".btn_primary span.value","type":"SelectorText","version":2},{"id":"data8","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.bi-activity-unit","type":"SelectorText","version":2},{"id":"data9","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"div.value","type":"SelectorText","version":2},{"id":"data10","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.label","type":"SelectorText","version":2},{"id":"data11","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"strong","type":"SelectorText","version":2},{"id":"data12","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"div.number-contact","type":"SelectorText","version":2},{"id":"data13","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"div.number-contact:nth-of-type(1)","type":"SelectorText","version":2},{"id":"data14","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"b","type":"SelectorText","version":2},{"id":"data15","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"li:nth-of-type(1)","type":"SelectorText","version":2},{"id":"data16","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"li:nth-of-type(2)","type":"SelectorText","version":2},{"id":"data17","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"b:nth-of-type(2)","type":"SelectorText","version":2},{"id":"data18","multiple":false,"multipleType":"singleColumn","parentSelectors":["record_wrapper"],"regex":"","selector":"span.note_moyenne","type":"SelectorText","version":2}]}
Questions:
- Is my pagination selector wrong because I’m targeting
span.value(current page) instead of a clickable<a>? - Should
parentSelectorsfor the pagination be only["_root"](not["_root","Pagination"])? - What is the correct way to paginate PagesJaunes in Web Scraper:
auto,clickMore, or “list of URLs” (page=1,2,3…)? - If it’s JS/AJAX-driven, is Web Scraper able to follow it, or do I need a different approach?
Thanks for any help / working example.