I'm trying a "deep" scraping, for my post-graduation final paper, and the Web Scraper works very well when I put just a few pages (like 3 pages) with 10 links each and it access all the 10 links of all the 3 pages, giving me the 30 observations/lines I want, working perfectly.
But, when I try to scrap more pages, I always have one of these two problems or even both of them same time:
- The scraper just acess the 10 links of the last page (it starts by the last page by the way), on the "next" pages it collects data from only one item per page.
- It stops collecting before complete all pages. The last time that it happened, I noticed a little message written on the left corner: "waiting for auth.reclameaqui.com.br" just before the scrap stops.
The two sitemap code is below, the only change between them is the pagination [8-10] to [8-97].
( Request interval (ms): 6000
Page load delay (ms): 4000
In both )
{"_id":"scraping_julho_2022","startUrl":["https://www.reclameaqui.com.br/empresa/bradesco-seguros/lista-reclamacoes/?pagina=[8-10]"],"selectors":[{"delay":0,"id":"links das reclamacoes","multiple":true,"parentSelectors":["_root"],"selector":".sc-1pe7b5t-0 a","type":"SelectorLink"},{"delay":0,"id":"Titulo","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"Empresa","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-5 a[data-testid]","type":"SelectorText"},{"delay":0,"id":"Local","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-7 span","type":"SelectorText"},{"delay":0,"id":"Data","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-8 span","type":"SelectorText"},{"delay":0,"id":"ID RA","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"span.lzlu7c-12","type":"SelectorText"},{"delay":0,"id":"Texto","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"p.lzlu7c-17","type":"SelectorText"}]}
{"_id":"scraping_julho_2022","startUrl":["https://www.reclameaqui.com.br/empresa/bradesco-seguros/lista-reclamacoes/?pagina=[8-97]"],"selectors":[{"delay":0,"id":"links das reclamacoes","multiple":true,"parentSelectors":["_root"],"selector":".sc-1pe7b5t-0 a","type":"SelectorLink"},{"delay":0,"id":"Titulo","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"Empresa","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-5 a[data-testid]","type":"SelectorText"},{"delay":0,"id":"Local","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-7 span","type":"SelectorText"},{"delay":0,"id":"Data","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-8 span","type":"SelectorText"},{"delay":0,"id":"ID RA","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"span.lzlu7c-12","type":"SelectorText"},{"delay":0,"id":"Texto","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"p.lzlu7c-17","type":"SelectorText"}]}
Do you have some tip for me? I'm desperate cause I need it for my post-graduation final paper.