Hi every one,
This issue needs a web scraper genius, just keep reading and you will know what I mean
I built a scraper but pagination on this site is a little complicated and tricky. When setting the pagination to the "Next" button it doesn't work (don't really know why). So I tried the method in which we add the pagination to the link by adding "[1-5]" to the link to scrap the first five pages as it works on the websites with links that changes and reflects the page number on the link.
However, this method doesn't work either and here is why. The link for the first page is as follow: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15
In the second page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=50 (note the additional part to the link "start=50 which is why the website shows a new set of 50 results)
In the third page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=100 (note the "start=100" now as new set of 50 results are showing)
so, instead of the page number is plus 1 (ex: page 1,2,3,etc) it is plus 50 and the 50 starts from the second page!
Is there a solution for this issue?!! I appreciate your assistance.
Url: First page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=1
Second page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=50 (note the additional part to the link "start=50 which is why the website shows a new set of 50 results)
Third page: https://eg.indeed.com/jobs?q=&l=Cairo&radius=100&jt=fulltime&sort=date&limit=50&fromage=15&start=100 (note the "start=100" now as new set of 50 results are showing)
Sitemap:
{"_id":"indeed_cairo","startUrl":["https://eg.indeed.com/وظائف?as_and&as_phr&as_any&as_not&as_ttl&as_cmp&jt=fulltime&st&radius=100&l=Cairo&fromage=15&limit=50&sort=date&psf=advsrch"],"selectors":[{"id":"Pagination","type":"SelectorLink","selector":"a:nth-of-type(20)","parentSelectors":["_root","Pagination"],"multiple":false,"delay":0},{"id":"Link Selector","type":"SelectorLink","selector":"a.turnstileLink.visited","parentSelectors":["_root","Pagination"],"multiple":true,"delay":0},{"id":"Job Title","type":"SelectorText","selector":"table#job-content > tbody > tr > td:nth-of-type(1) > div","parentSelectors":["Link Selector"],"multiple":false,"regex":"","delay":0},{"id":"Details and requirments","type":"SelectorText","selector":"span.summary div div div div:nth-of-type(1) div, span.summary div:nth-of-type(n+2), div#p_9ffa029926f132f4, table#job-content > tbody > tr > td:nth-of-type(1)","parentSelectors":["Link Selector"],"multiple":false,"regex":"","delay":0}]}