Pagination exception

Pagination won't work for my current site. For example, Web scraper generates a link for pagination that looks like this:

from the following HTML code:

<a class="page-link -sitemap-select-item-selected" data-tracking-control-name="pagination-2" href="#page=2" title="Page 2" data-li-page="2"><span class="hide-a11y">Page </span>2</a>

But, the real URL that gets executed when one physically clicks on the Page 2 pagination link on the web site is the following:

Page 3 URL looks like the following:

Obviously, some sort of translation to showing the first 25 entries in the result set, then the next 25 entries, and so on.

I don't know if this URL encryption is meant to defeat web scraping or not, however, it doesn't allow for Web Scraper to property compute the correct next page URL.

How do I get around this?


I have a sitemap for Linkedin projects. Let me dig it up and see how I handled It. Can take remember if I used this or dataminer to scrape.

1 Like

Here is the sitemap that will paginate through linkedin recruiter projects scraping name, title, location and current pipeline status. You need t change the starting URL to match your project. I used ( as the link selector and made that a child of root and it's self.

I then created an element which identified each row. (which was also made a child of the pagination page)


1 Like

Thank you 'bretfeig'.

However, this is not working.

'' produces the URL of Which is not the actual URL you are sent to when you click on the NEXT button. You are instead sent to '...1074489146#status/0/50'.


I think I figured out how to make this work. Web Scraper has an "Element Click Selector" for actually clicking the Next button. But I have yet to figure out how to use it properly. I have reviewed the documentation for it, but alas, still confused. Anyone?

I have a "Next" button on the list of results; it shows the next 25 results and so on until there are no more results.

I have added a "Element Click Selector" under root named "page".
Selector is set to "div.row"; this contains each candidate's information that I am scraper.
I set Click Selector to "".
Click Type is set to "Click More".
Click Element Uniqueness is set to "Unique Text".
Multiple is checked.
Discard is unchecked.
Delay is set to 3000ms.

I have an Element Selector named "Candidate". Child of "root" and "page".

If I Scrape, it does go through every page, but only scrapes the data from the first and last page! What am I doing wrong?

{"_id":"linkedinv2","startUrl":[""],"selectors":[{"id":"Candidate","type":"SelectorElement","selector":"div.row","parentSelectors":["_root","page"],"multiple":true,"delay":"500"},{"id":"First Name","type":"SelectorText","selector":"a.title","parentSelectors":["Candidate"],"multiple":false,"regex":".(?=\s)","delay":0},{"id":"Last Name","type":"SelectorText","selector":"a.title","parentSelectors":["Candidate"],"multiple":false,"regex":"(?<=\s).","delay":0},{"id":"Employer","type":"SelectorText","selector":"p.headline","parentSelectors":["Candidate"],"multiple":false,"regex":"(?<=\sat\s+).","delay":0},{"id":"Location","type":"SelectorText","selector":"dd:nth-of-type(1)","parentSelectors":["Candidate"],"multiple":false,"regex":"","delay":0},{"id":"Title","type":"SelectorText","selector":"p.headline","parentSelectors":["Candidate"],"multiple":false,"regex":".(?=\sat\s)","delay":0},{"id":"page","type":"SelectorElementClick","selector":"div.row","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

You don't need "Candidate" selector. Make all of the "Candidate" child selectors as child selectors for "page" selector and delete "Candidate" selector.

Thanks so much. In hind-sight, that should have been obvious to me.


Hmm. That's odd, it worked fine for me