Pagination Issue # based pages

I’m trying to scrape a site that uses numbered page tabs (1, 2, 3, etc) instead of normal next/previous pagination.

Page URLs are formatted like:

  • /oil-pressure-senders-and-sensors#1
  • /oil-pressure-senders-and-sensors#2
  • /oil-pressure-senders-and-sensors#3
  • etc (some categories have 15+ pages)

How the site works:

  • Page numbers are shown as tabs (1, 2, 3, etc)
    **Clicking a number updates the table on the same page
  • Opening #2, #3, etc directly in a new tab does show the correct page

The problem:

  • Using type "pagination** causes Web Scraper to scrape only the last page
  • Page 1 and intermediate pages are skipped

What I’m trying to achieve:

  • Scrape all pages, not just the first or last
  • Do this via pagination, not by manually adding start URLs

Is this a known limitation with tab-style / hash-based pagination?

Hi,

This should work using the Pagination selector with type set to 'Click multiple times on next/more button ([Next page] [Load More])'.

Could you share the website so I can inspect the mechanics in action?

HI there, I have tried that option but it still does not work, website below:

In this case, there is actually no pagination required as all the data is initially loaded in the HTML:

{"_id":"tridon","startUrl":["https://www.tridon.co.nz/products/Tridon/35/483/switches-and-sensors/2011/oil-pressure-senders-and-sensors"],"selectors":[{"elementLimit":0,"id":"product_wrapper","multiple":true,"parentSelectors":["_root"],"scroll":false,"selector":"tr[class*=\"row\"]","type":"SelectorElement"},{"id":"Part No","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"regex":"","selector":"td:nth-of-type(1) a","type":"SelectorText","version":2},{"id":"name","multiple":false,"multipleType":"singleColumn","parentSelectors":["product_wrapper"],"regex":"","selector":"td:nth-of-type(3)","type":"SelectorText","version":2},{"id":"link","linkType":"linkFromHref","multiple":false,"parentSelectors":["product_wrapper"],"selector":"td:nth-of-type(1) a","type":"SelectorLink","version":2}]}

Hi there, thanks. However I have tried with no pagination by selecting the product links but it only seems to collect product links from the visible page? Can you assist.

Did you check the sitemap in my previous message?

The challenge here is that the select tool targets the visible elements, aka the data on page 1. To construct the selector to match all elements, the HTML has to be inspected.

Great all working, thanks for your help

1 Like