Latest update breaks pagination

The latest update of your extension v1.29.60, (that opens the scrape in a new window) broke all of my sitemaps that rely on “next" pagination. (I'm using Google Chrome Version 113.0.5672.92).

WTF man!

This is a HUGE issue!!!!

I use this extension every day to scrape results of websites that has data only available behind a username and password. AKA I can’t use your paid service.

I REALLY need your extension to work correctly.

I built a new test sitemap that all it was supposed to do was click the “next” pagination button, and it fails to do so.

ALL my other sitemaps that MUST click the “Next" pagination are broken.

What did you change??????

How can this be fixed????

Richard

EDIT 01: I tried the exact same sitemap in Firefox v113.0 with Webscraper Extension v0.6.5 and the scrape performed exactly as expected and completed successfully.

Obviously, the latest version of the Chrome Extension v1.29.60 has something wrong with it, as it relates to pagination.

EDIT 02: I've tried to Rollback the extension on Chrome to the last known working version, 0.6.5_0, but Chrome overwrites it on restart. Surely I'm not the only one experiencing these issues.

I've come to DEPEND on this extension multiple times a day. Now that it's broken I Can't get work done. Please assist in fixing this broken function.

2 Likes

I have the exact same problem... pagination doesn't work anymore.... I was looking for a way to fix that !!!! I have 2 days late in my work due to that !!! :expressionless: :expressionless: :expressionless: :expressionless: :expressionless:

@calidera @raceman Hello, could, you, please, provide the sitemap you are referring to?

1 Like

Hi I'd like to scrape : CHATEAUX-EN-FRANCE | Découvrez les plus beaux chateaux.

This is the sitemap : {"_id":"chateauxfrance","startUrl":["CHATEAUX-EN-FRANCE | Découvrez les plus beaux chateaux. a","multiple":true},{"id":"Name","parentSelectors":["links"],"type":"SelectorText","selector":"h1.text-white","multiple":false,"regex":""},{"id":"Tel","parentSelectors":["links"],"type":"SelectorText","selector":"div.bg-light:nth-of-type(3) > a:nth-of-type(1)","multiple":false,"regex":""},{"id":"Mail","parentSelectors":["links"],"type":"SelectorText","selector":"div:nth-of-type(3) > a:nth-of-type(2)","multiple":false,"regex":""},{"id":"Région","parentSelectors":["links"],"type":"SelectorText","selector":"a span.text-primary","multiple":false,"regex":""}]}

Pagination doesn't work. The bot only scrap the first page and not the other ones. :upside_down_face:

@calidera Hello, when pasting the sitemap, please, apply the preformatted text option. Otherwise, the sitemaps JSON is not valid.

1 Like

Hi, thanks, here it is :

{"_id":"chateauxfrance","startUrl":["https://www.chateaux-en-france.com/index.php?page=listing&recupdata=get&lang=&k=&search_nom=&search_type=muse&search_sous_type=&search_statut=&search_id_user_fiche=&search_note=0&specialget=&iteration=1&utm=#next"],"selectors":[{"id":"Pagine","paginationType":"clickMore","parentSelectors":["_root","Pagine"],"selector":"a.next","type":"SelectorPagination"},{"id":"links","multiple":true,"parentSelectors":["_root"],"selector":"div.gyg-item:nth-of-type(n+2) a","type":"SelectorLink"},{"id":"Name","multiple":false,"parentSelectors":["links"],"regex":"","selector":"h1.text-white","type":"SelectorText"},{"id":"Tel","multiple":false,"parentSelectors":["links"],"regex":"","selector":"div.bg-light:nth-of-type(3) > a:nth-of-type(1)","type":"SelectorText"},{"id":"Mail","multiple":false,"parentSelectors":["links"],"regex":"","selector":"div:nth-of-type(3) > a:nth-of-type(2)","type":"SelectorText"},{"id":"Région","multiple":false,"parentSelectors":["links"],"regex":"","selector":"a span.text-primary","type":"SelectorText"}]}

@3HAT0K Hello, this has already been addressed to our development team. As a temporary solution, please, be sure to click on the left 'info' tab after the scraper pop-up window appears.

1 Like

@calidera It appears that the issue arises due to a faulty pagination setup, however the website contains many duplicate links.

Please, note that the scraper will not visit the same link twice.

Here's a sitemap example:

{"_id":"chateauxfrance","startUrl":["https://www.chateaux-en-france.com/index.php?page=listing&recupdata=get&lang=&k=&search_nom=&search_type=muse&search_sous_type=&search_statut=&search_id_user_fiche=&search_note=0&specialget=&iteration=1&utm=#next"],"selectors":[{"id":"Pagine","paginationType":"linkFromHref","parentSelectors":["_root","Pagine"],"selector":"a.next","type":"SelectorPagination"},{"id":"links","multiple":true,"parentSelectors":["_root","Pagine"],"selector":"div.gyg-item:nth-of-type(n+2) a","type":"SelectorLink"}]}

Ok thanks, so what should I do to fix that ? Thanks

@ViestursWS

I will include the sitemap below as you requested.

This is a simple test that should click the "next" link for about 12 pages and extract an address.

I will obfuscate the URL because it’s a paid site and you will not be able to log into it.

{"_id":"test-pagination-next","startUrl":["https://example.com?c=AAEAAAD*****AQAAAAAAAAARAQAAAEQAAAAGAgAAAAQyMTAwBgMAAAABMgYEAAAAAjUwBgUAAAACMTYGBgAAAAIxNg0CBgcAAAADMTIwDQQGCAAAAAgxMzAwODIzOQ0EBgkAAAABMQYKAAAAATANBQYLAAAAATENHAYMAAAAATENCwYNAAAABBF(V2ANAgs)"],"selectors":[{"id":"pagination-next","paginationType":"auto","parentSelectors":["_root","pagination-next"],"selector":"a#m_DisplayCore_dpy3","type":"SelectorPagination"},{"id":"address","multiple":false,"parentSelectors":["_root","pagination-next"],"regex":"","selector":"span.d14m10","type":"SelectorText"}]}

Please note that this is just a TEST sitemap, the actual sitemap I use is MUCH larger and extracts a lot of information.

HOWEVER, even this basic sitemap is non-functional in Webscraper v1.29.60.

It works perfectly in Webscraper v0.6.5

Thank you for your continued assistance.

I still have a few problems too... Support is reactive but nothing is working since this update !! :upside_down_face: :expressionless:

I am having the same Issue, I see that you are working on it to get it fixed. Thanks for listening to your users and for making this tool free for everyone. :smiley:

@fede Hello, could you, please, provide us with your sitemap?

{"_id":"Renta_Inmuebles","startUrl":["https://www.inmuebles24.com/inmuebles-en-renta-en-ciudad-de-mexico.html"],"selectors":[{"id":"Anuncios","multiple":true,"parentSelectors":["_root","Pages"],"selector":".clDfxH div.sc-i1odl-2","type":"SelectorElement"},{"id":"Precio","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-12dh9kl-4","type":"SelectorText"},{"id":"Mantenimiento","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-12dh9kl-2","type":"SelectorText"},{"id":"Titulo","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"a","type":"SelectorText"},{"id":"Delegación_Municipio","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-ge2uzh-2","type":"SelectorText"},{"id":"tamaño total","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(1) span","type":"SelectorText"},{"id":"tamaño construido","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(2) span","type":"SelectorText"},{"id":"Recamaras","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(3) span","type":"SelectorText"},{"id":"baños","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(4) span","type":"SelectorText"},{"id":"cajones","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(5) span","type":"SelectorText"},{"id":"Descripción","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-i1odl-12","type":"SelectorText"},{"id":"tipo de anuncio","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span.sc-ryls1p-0","type":"SelectorText"},{"id":"Pages","paginationType":"clickMore","parentSelectors":["_root","Pages"],"selector":"a.sc-n5babu-2","type":"SelectorPagination"}]}

@fede Hello, after inspecting your sitemap it appears that the 'Pagination' selector is functional for both - next and previous buttons which you can test by navigating to the second page and pressing the 'Element preview' button.

To fix this, please, specify the selector to target the 'Next' button only.

Here's a sitemap example:

{"_id":"Renta_Inmuebles","startUrl":["https://www.inmuebles24.com/inmuebles-en-renta-en-ciudad-de-mexico.html"],"selectors":[{"id":"Pages","paginationType":"linkFromHref","parentSelectors":["_root","Pages"],"selector":"a[data-qa=\"PAGING_NEXT\"]","type":"SelectorPagination"},{"id":"Anuncios","multiple":true,"parentSelectors":["Pages"],"selector":"div[data-to-posting]","type":"SelectorElement"},{"id":"Precio","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-12dh9kl-4","type":"SelectorText"},{"id":"Mantenimiento","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-12dh9kl-2","type":"SelectorText"},{"id":"Titulo","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"a","type":"SelectorText"},{"id":"Delegación_Municipio","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-ge2uzh-2","type":"SelectorText"},{"id":"tamaño total","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(1) span","type":"SelectorText"},{"id":"tamaño construido","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(2) span","type":"SelectorText"},{"id":"Recamaras","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(3) span","type":"SelectorText"},{"id":"baños","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(4) span","type":"SelectorText"},{"id":"cajones","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span:nth-of-type(5) span","type":"SelectorText"},{"id":"Descripción","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"div.sc-i1odl-12","type":"SelectorText"},{"id":"tipo de anuncio","multiple":false,"parentSelectors":["Anuncios"],"regex":"","selector":"span.sc-ryls1p-0","type":"SelectorText"}]}

Thanks for the reply! This doesnt seem to work either. Let me elaborate on my issue.
The program is targeting the next button and does mannage to click on it and take me to the second page. Yet when it tries to go to the third page for some reason it returns to the main page. My thoughs were that probably it's clicking on the back page instead of next so I tried isolating the issue by starting on the 10th page, but it starts on the correct page and then goes straght to the home page when it's supposed to click next and go to the 11th page. I've also noticed that the data that is shown on the screen while scraping is not the data that is later exported. I do not know if the cache is stuck and can't reload new data. I tried isolating this issue by running the scraper in incognito mode, and it didn't work. I also tried your firefox extension and the same issue happened on the first try. These both issues happen no matter how many pieces of data I try to scrape (and just to clarify, I do click on the refresh data button after every scrape). I also tried starting from page 900 and going back page by page but these issues persist. The program jumps back to the main page and then stops. I also tried instead of using pagination just clicking on the link, but it didn't work. I'm not sure what else I could try.

@fede Hi!

Are you using the same sitemap which was provided earlier? Could you, please, provide a recording that indicates this issue? Alternatively, you can test the pagination execution using the predefined page range within the start URL instead.

Here's another example:

{"_id":"Renta_Inmuebles-10-pages","startUrl":["https://www.inmuebles24.com/inmuebles-en-renta-en-ciudad-de-mexico-pagina-[1-10].html"],"selectors":[{"id":"Anuncios","parentSelectors":["_root"],"type":"SelectorElement","selector":"div[data-to-posting]","multiple":true},{"id":"Precio","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-12dh9kl-4","multiple":false,"regex":""},{"id":"Mantenimiento","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-12dh9kl-2","multiple":false,"regex":""},{"id":"Titulo","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"a","multiple":false,"regex":""},{"id":"Delegación_Municipio","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-ge2uzh-2","multiple":false,"regex":""},{"id":"tamaño total","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(1) span","multiple":false,"regex":""},{"id":"tamaño construido","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(2) span","multiple":false,"regex":""},{"id":"Recamaras","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(3) span","multiple":false,"regex":""},{"id":"baños","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(4) span","multiple":false,"regex":""},{"id":"cajones","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(5) span","multiple":false,"regex":""},{"id":"Descripción","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-i1odl-12","multiple":false,"regex":""},{"id":"tipo de anuncio","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span.sc-ryls1p-0","multiple":false,"regex":""}]}

Hello, yesterday I ended up doing lots and lots of modifications to the site map that you sent to me, until I found a way for it to work. Below is the final site map that managed to scrape the data. Thank you very much for your amazing support I am truly greatful that you make this tool free for everyone and for the great response that you have given me over these past days. The world needs more people like you!

{"_id":"FixedVersionRentas","startUrl":["https://www.inmuebles24.com/inmuebles-en-renta-en-ciudad-de-mexico-ordenado-por-antiguedad-descendente-pagina-10.html"],"selectors":[{"id":"Anuncios","parentSelectors":["Pages"],"type":"SelectorElement","selector":"div[data-to-posting]","multiple":true},{"id":"Pages","parentSelectors":["_root","Pages"],"paginationType":"clickMore","selector":"a.gudFvk","type":"SelectorPagination"},{"id":"Precio","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-12dh9kl-4","multiple":false,"regex":""},{"id":"Mantenimiento","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-12dh9kl-2","multiple":false,"regex":""},{"id":"Titulo","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"a","multiple":false,"regex":""},{"id":"Delegación_Municipio","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-ge2uzh-2","multiple":false,"regex":""},{"id":"tamaño total","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(1) span","multiple":false,"regex":""},{"id":"tamaño construido","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(2) span","multiple":false,"regex":""},{"id":"Recamaras","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(3) span","multiple":false,"regex":""},{"id":"baños","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(4) span","multiple":false,"regex":""},{"id":"cajones","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span:nth-of-type(5) span","multiple":false,"regex":""},{"id":"Descripción","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"div.sc-i1odl-12","multiple":false,"regex":""},{"id":"tipo de anuncio","parentSelectors":["Anuncios"],"type":"SelectorText","selector":"span.sc-ryls1p-0","multiple":false,"regex":""}]}