Pagination Ease

gozwald · May 2, 2018, 11:06am

Hi There!

After having discussed this in the forum regarding how the scraper deals with recursive pagination, in the circumstance utilising a "next" link until there are no further pages to scrape.

What if as an alternative to the the current configuration where the scraper first discovers the depth of the pagination links, and only then work its way backward to page one you had the option, it could:

a) Simply have the number of pagination links pre-entered, so the scraper knows beforehand how deep to go (and also circumvent the huge amount of time it may take to discover a very large number of pages)

b) (the more elegant option) where the scraper doesn't predetermine the depth of pagination, and simply scrapes till there are no longer any further "next" links.

Thanks so much for this project. I love it!

KristapsWS · May 2, 2018, 2:58pm

It depends on your selector sequence.

If you make pagination before item selector, items will be scraped first before going to the next page.

If you make item selector before pagination, web scraper will go through all of the pagination before scraping any item pages.