How to scrap a website with dynamic pages number of pagination

Hi everybody, I need your help to fix my trouble in scrapping a website with the length (numbers of pagination) of pagination is dynamic.
For example, when you're in the 1st page of the pagination, the pagination number will be « 1 2 3 4 5 ... 8 », while you're in the 5th page of the pagination, the pagination number will be « 1 ... 3 4 5 6 7 8 » and when you're in the last page of the pagination, the pagination number will be « 1 ... 4 5 6 7 8 » and if the contents within 1 pagination page is less than 20 links (news) the pagination number will only return « 1 ».
FYI the "..." characters doesn't contain any links.
With current sitemap config, it will be effective to scrape the data that i need until page number 4 and if the contents has more than 20 news within 1 pagination page, otherwise it will return other link that i didn't expect.

Url: https://news.detik.com/indeks

Sitemap:
{"_id":"detik","startUrl":["https://news.detik.com/indeks"],"selectors":[{"id":"links","type":"SelectorLink","parentSelectors":["_root"],"selector":"div.desc_idx a","multiple":true,"delay":0},{"id":"elements","type":"SelectorElement","parentSelectors":["links"],"selector":"div.detail_tag","multiple":false,"delay":0},{"id":"tag","type":"SelectorText","parentSelectors":["elements"],"selector":"a","multiple":true,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["_root","links"],"selector":"div.paging a:nth-child(n+3):nth-child(-n+6) ","multiple":true,"delay":0}]}

This is a quite common scenario which is covered in the Web Scraper pagination tutorial vid on YouTube. Around 1:48

3 Likes

I tried following the tutorial, but data doesn't exact from a single page. tried element click which worked for "next" action, but would skip many pages. At a loss after trying all pagination methods from tutorial

@david2020 Hi, are you able to share your sitemap with us, please?

@ViestursWS - i uploaded the pagination issue. when the scraper hits page 7, the next "click" is on the "..." which navigates to 8 and then through 14 then skip "..." because i think its no longer unique. After navigating through 14, its skips the "..." then clicks the Last number 790 which opens the links 785-790. Then runs through those...then scraper thinks its at the end of the list.
Site map - site is password protected
{"_id":"agents-30004-pagnationoption","startUrl":["https://matrix.fmlsd.mlsmatrix.com/Matrix/Results.aspx?c=AAEAAAD*****AQAAAAAAAAARAQAAAEQAAAAGAgAAAAQyNzc4BgMAAAABMgYEAAAAAjEwBgUAAAACMTkGBgAAAAIxOQ0CBgcAAAACMTQNCQYIAAAAAjI1BgkAAAABMAoGCgAAAAEwDSAGCwAAAAExDQsGDAAAAAcXwp7CisOXDQIL"],"selectors":[{"id":"pagination","paginationType":"clickOnce","parentSelectors":["_root","pagination"],"selector":".active a, #m_upPaging a:nth-of-type(n+2)","type":"SelectorPagination"},{"delay":0,"id":"fullname","multiple":false,"parentSelectors":["pagination"],"regex":"","selector":".d19m8 tr:contains('Full Name:') span.formula","type":"SelectorText"}]}