Help with this pagination

Gabriel · April 8, 2019, 2:46pm

I have problems again with the pagination. Anybody can help ?

Url: https://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html

Sitemap:
{"_id":"eleconomista","startUrl":["https://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html"],"selectors":[{"id":"Empresa","type":"SelectorLink","parentSelectors":["_root"],"selector":"td.tal a","multiple":true,"delay":0},{"id":"empresa","type":"SelectorText","parentSelectors":["Empresa"],"selector":"tr.even:contains('Denominación') td.tal:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"url","type":"SelectorLink","parentSelectors":["Empresa"],"selector":"tr.even:contains('Página Web') a.url","multiple":false,"delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["_root"],"selector":"li:nth-of-type(6) a","multiple":true,"delay":0}]}

webber · April 9, 2019, 7:16am

As there is no link under the pagination button, you have to use the Element Click selector to iterate through the pages. Here is an updated sitemap:

{"_id":"eleconomista","startUrl":["https://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html"],"selectors":[{"id":"Empresa","type":"SelectorLink","parentSelectors":["element-click"],"selector":"td.tal a","multiple":false,"delay":0},{"id":"empresa","type":"SelectorText","parentSelectors":["Empresa"],"selector":"tr.even:contains('Denominación') td.tal:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"url","type":"SelectorLink","parentSelectors":["Empresa"],"selector":"tr.even:contains('Página Web') a.url","multiple":false,"delay":0},{"id":"element-click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"tr.tr_hover_even","multiple":true,"delay":"2000","clickElementSelector":"li.arrow a:contains('»')","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"}]}

Gabriel · April 9, 2019, 2:37pm

Thanks, but It does not work. Are you sure its right this way ? (check screenshot) empresite

webber · April 10, 2019, 8:55am

{"_id":"eleconomista","startUrl":["https://ranking-empresas.eleconomista.es/ranking_empresas_nacional.html"],"selectors":[{"id":"Empresa","type":"SelectorLink","parentSelectors":["element-click"],"selector":"td.tal a","multiple":false,"delay":0},{"id":"empresa","type":"SelectorText","parentSelectors":["Empresa"],"selector":"tr.even:contains('Denominación') td.tal:nth-of-type(2)","multiple":false,"regex":"","delay":0},{"id":"url","type":"SelectorLink","parentSelectors":["Empresa"],"selector":"tr.even:contains('Página Web') a.url","multiple":false,"delay":0},{"id":"element-click","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"tr.tr_hover_even","multiple":true,"delay":"2000","clickElementSelector":"li.arrow a:contains('»')","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"}]}

Had the click selector type as 'click once' for testing purposes, forgot to change it back. Should be working now.

Gabriel · April 10, 2019, 3:34pm

Looks fantastic, but I'm now struggling with Captchas.

Do you know we I could get a solution for this?

KristapsWS · April 11, 2019, 7:36am

You can avoid CAPTCHAs while scraping by using proxy and rotating your IP address periodically. Cloud Web Scraper has this feature and you can try it for free, more info: https://www.webscraper.io/cloud-scraper .

taper · April 17, 2019, 12:22pm

Have you tried using residential proxies? Tried these ones, and they actually rarely get any captchas at all. A bit pricy, but if you need to do a lot of scraping, i think its worth it

KristapsWS · May 7, 2019, 7:21am