My sitemap doesn't work when I want more data

I'm trying a "deep" scraping, for my post-graduation final paper, and the Web Scraper works very well when I put just a few pages (like 3 pages) with 10 links each and it access all the 10 links of all the 3 pages, giving me the 30 observations/lines I want, working perfectly.

But, when I try to scrap more pages, I always have one of these two problems or even both of them same time:

  1. The scraper just acess the 10 links of the last page (it starts by the last page by the way), on the "next" pages it collects data from only one item per page.
  2. It stops collecting before complete all pages. The last time that it happened, I noticed a little message written on the left corner: "waiting for auth.reclameaqui.com.br" just before the scrap stops.

The two sitemap code is below, the only change between them is the pagination [8-10] to [8-97].

( Request interval (ms): 6000
Page load delay (ms): 4000
In both )

{"_id":"scraping_julho_2022","startUrl":["https://www.reclameaqui.com.br/empresa/bradesco-seguros/lista-reclamacoes/?pagina=[8-10]"],"selectors":[{"delay":0,"id":"links das reclamacoes","multiple":true,"parentSelectors":["_root"],"selector":".sc-1pe7b5t-0 a","type":"SelectorLink"},{"delay":0,"id":"Titulo","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"Empresa","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-5 a[data-testid]","type":"SelectorText"},{"delay":0,"id":"Local","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-7 span","type":"SelectorText"},{"delay":0,"id":"Data","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-8 span","type":"SelectorText"},{"delay":0,"id":"ID RA","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"span.lzlu7c-12","type":"SelectorText"},{"delay":0,"id":"Texto","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"p.lzlu7c-17","type":"SelectorText"}]}

{"_id":"scraping_julho_2022","startUrl":["https://www.reclameaqui.com.br/empresa/bradesco-seguros/lista-reclamacoes/?pagina=[8-97]"],"selectors":[{"delay":0,"id":"links das reclamacoes","multiple":true,"parentSelectors":["_root"],"selector":".sc-1pe7b5t-0 a","type":"SelectorLink"},{"delay":0,"id":"Titulo","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"Empresa","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-5 a[data-testid]","type":"SelectorText"},{"delay":0,"id":"Local","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-7 span","type":"SelectorText"},{"delay":0,"id":"Data","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":".lzlu7c-8 span","type":"SelectorText"},{"delay":0,"id":"ID RA","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"span.lzlu7c-12","type":"SelectorText"},{"delay":0,"id":"Texto","multiple":false,"parentSelectors":["links das reclamacoes"],"regex":"","selector":"p.lzlu7c-17","type":"SelectorText"}]}

Do you have some tip for me? I'm desperate cause I need it for my post-graduation final paper.

@Cayyan Hi, the sitemaps seem to be fully functional. The issue most likely appears due to your network performance, os version, location, or other factors.

1 Like

Thank you, Viesturs.

Do you have some tips about network, best browser to scraping ou anything?

Also, what could be an ideal Request interval and Page load delay?

@Cayyan Have you tried launching the scraping job via Web Scraper Cloud? The trial version comes with 1'000 free page credits. You should be able to test your sitemap there.