Pagination issue (Complicated)

Hi every one, :vulcan_salute:

This issue needs a web scraper genius, just keep reading and you will know what I mean :smile:

I built a scraper but pagination on this site is a little complicated and tricky. When setting the pagination to the "Next" button it doesn't work (don't really know why). So I tried the method in which we add the pagination to the link by adding "[1-5]" to the link to scrap the first five pages as it works on the websites with links that changes and reflects the page number on the link.

However, this method doesn't work either and here is why. The link for the first page is as follow: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15

In the second page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=50 (note the additional part to the link "start=50 which is why the website shows a new set of 50 results)

In the third page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=100 (note the "start=100" now as new set of 50 results are showing)

so, instead of the page number is plus 1 (ex: page 1,2,3,etc) it is plus 50 and the 50 starts from the second page!

Is there a solution for this issue?!! I appreciate your assistance.

Url: First page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=1

Second page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=50 (note the additional part to the link "start=50 which is why the website shows a new set of 50 results)

Third page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=100 (note the "start=100" now as new set of 50 results are showing)

Sitemap:
{"_id":"indeed_cairo","startUrl":["https://eg.indeed.com/وظائف?as_and&as_phr&as_any&as_not&as_ttl&as_cmp&jt=fulltime&st&radius=100&l=Cairo&fromage=15&limit=50&sort=date&psf=advsrch"],"selectors":[{"id":"Pagination","type":"SelectorLink","selector":"a:nth-of-type(20)","parentSelectors":["_root","Pagination"],"multiple":false,"delay":0},{"id":"Link Selector","type":"SelectorLink","selector":"a.turnstileLink.visited","parentSelectors":["_root","Pagination"],"multiple":true,"delay":0},{"id":"Job Title","type":"SelectorText","selector":"table#job-content > tbody > tr > td:nth-of-type(1) > div","parentSelectors":["Link Selector"],"multiple":false,"regex":"","delay":0},{"id":"Details and requirments","type":"SelectorText","selector":"span.summary div div div div:nth-of-type(1) div, span.summary div:nth-of-type(n+2), div#p_9ffa029926f132f4, table#job-content > tbody > tr > td:nth-of-type(1)","parentSelectors":["Link Selector"],"multiple":false,"regex":"","delay":0}]}

Hi again!

There's absolutely no trick, cause page number is generated based on a results shown, e.g. 50 results 2 page = &start=100, page 3 = &start=150, it's multiplied by a results number.

And the mistake you made in previous sitemap i've mentioned recurse is present in this sitemap as well.

image

You got to do it yourself this time, hit Edit in Pagination selector, and press '_root' within parents selectors list.

Now for the pagination, it works as expected if all pages were picked (it goes trough all the pages from start to end and finishes properly).

Good luck!

Hi,

I feel so stupid.. I am sorry.

I over analysed and thought it is super hard however it was super easy.. Thank you for teaching me.

I did it and it is working fine now. :slight_smile:

thx...

1 Like