Hi everybody, I need your help to fix my trouble in scrapping a website with the length (numbers of pagination) of pagination is dynamic.
For example, when you're in the 1st page of the pagination, the pagination number will be « 1 2 3 4 5 ... 8 », while you're in the 5th page of the pagination, the pagination number will be « 1 ... 3 4 5 6 7 8 » and when you're in the last page of the pagination, the pagination number will be « 1 ... 4 5 6 7 8 » and if the contents within 1 pagination page is less than 20 links (news) the pagination number will only return « 1 ».
FYI the "..." characters doesn't contain any links.
With current sitemap config, it will be effective to scrape the data that i need until page number 4 and if the contents has more than 20 news within 1 pagination page, otherwise it will return other link that i didn't expect.
Url: https://news.detik.com/indeks
Sitemap:
{"_id":"detik","startUrl":["https://news.detik.com/indeks"],"selectors":[{"id":"links","type":"SelectorLink","parentSelectors":["_root"],"selector":"div.desc_idx a","multiple":true,"delay":0},{"id":"elements","type":"SelectorElement","parentSelectors":["links"],"selector":"div.detail_tag","multiple":false,"delay":0},{"id":"tag","type":"SelectorText","parentSelectors":["elements"],"selector":"a","multiple":true,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","parentSelectors":["_root","links"],"selector":"div.paging a:nth-child(n+3):nth-child(-n+6) ","multiple":true,"delay":0}]}