Pagination (Beta) with a delay

Hello,

I have an issue with a sitemap to scrape multiple pages using the Pagination (Beta) type. What I believe is the issue is that it doesn't allow the pages to load long enough before clicking the next one. The reason I'm suspecting this is that it gets the content of the first and last page but not the ones in the middle. They don't appear to load before the pagination clicks next from what I can tell. The sitemap is below:

{"_id":"puben_trafikverkets_infrastrukturregelverk","startUrl":["https://puben.trafikverket.se/dpub/sok"],"selectors":[{"id":"pagination","parentSelectors":["_root","pagination"],"paginationType":"clickOnce","selector":"button.border-danger + button","type":"SelectorPagination"},{"id":"document","parentSelectors":["pagination"],"type":"SelectorLink","selector":".text-medium a","multiple":true},{"id":"nedladdningslänk","parentSelectors":["document"],"type":"SelectorLink","selector":"div:nth-of-type(n+10) div:nth-of-type(n+2) a","multiple":true},{"id":"ämnesområde","parentSelectors":["document"],"type":"SelectorText","selector":"div.form-group:nth-of-type(4) div","multiple":false,"regex":""},{"id":"titel","parentSelectors":["document"],"type":"SelectorText","selector":".col > div > div:nth-of-type(1) div.col-sm-6","multiple":false,"regex":""},{"id":"nummer","parentSelectors":["document"],"type":"SelectorText","selector":"h1","multiple":false,"regex":""}]}

Grateful for any tips on how to improve the sitemap

Best regards

@unique Hi, please, be sure to apply the preformatted text option on the JSON code of your sitemap, otherwise, it seems to be invalid.

Thanks for the suggestion. I've edited my original post.

@unique Hello, the default 'Delay' value for the 'Pagination' selector is 2'000 and, unfortunately, it can not be changed currently.

In order to circumvent this, you can substitute the 'Pagination' selector with an 'Element click'(with the delay value set to at least 2'500) selector instead.

Here's an example:

{"_id":"puben_trafikverkets_infrastrukturregelverk","startUrl":["https://puben.trafikverket.se/dpub/sok"],"selectors":[{"clickElementSelector":"button.border-danger + button:not([aria-label=\"Gå fram en sida\"])","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","delay":2700,"discardInitialElements":"do-not-discard","id":"pagination","multiple":true,"parentSelectors":["_root"],"selector":"div.card-body","type":"SelectorElementClick"},{"id":"document","multiple":true,"parentSelectors":["pagination"],"selector":".text-medium a","type":"SelectorLink"},{"extractAttribute":"href","id":"nedladdningslänk","parentSelectors":["document"],"selector":"div:nth-of-type(n+10) div:nth-of-type(n+2) a","type":"SelectorGroup"},{"id":"ämnesområde","multiple":false,"parentSelectors":["document"],"regex":"","selector":"div.form-group:nth-of-type(4) div","type":"SelectorText"},{"id":"titel","multiple":false,"parentSelectors":["document"],"regex":"","selector":".col > div > div:nth-of-type(1) div.col-sm-6","type":"SelectorText"},{"id":"nummer","multiple":false,"parentSelectors":["document"],"regex":"","selector":"h1","type":"SelectorText"}]}

1 Like

Thanks for the suggestion. I tried with your example code and increased the delay by a bit and now it's working perfectly!