Pagination assistance

Hello,

I'm having issues with pagination for a certain website that I'm trying to scrape information from. The data comes out fine but I can't seem to get the pagination working. I have tried a bunch of different approaches but haven't had any luck. I have been successful on other websites including the https://webscraper.io/ test sites but don't understand how to do it for the website below.

The website I'm trying with is as follows:

Url: Puben

The last tried sitemap I have used:

Sitemap:
{"_id":"trafikverket-puben-2","startUrl":["https://puben.trafikverket.se/dpub/sok"],"selectors":[{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":"button.ng-star-inserted:nth-of-type(n+3)","type":"SelectorPagination"},{"delay":0,"id":"document","multiple":true,"parentSelectors":["pagination"],"selector":".text-medium a","type":"SelectorLink"},{"delay":0,"id":"nedladdningslänk","multiple":true,"parentSelectors":["document"],"selector":"div:nth-of-type(n+10) div:nth-of-type(n+2) a","type":"SelectorLink"},{"delay":0,"id":"ämnesområde","multiple":false,"parentSelectors":["document"],"regex":"","selector":"div:nth-of-type(4) div","type":"SelectorText"},{"delay":0,"id":"titel","multiple":false,"parentSelectors":["document"],"regex":"","selector":".col > div > div:nth-of-type(1) div","type":"SelectorText"},{"delay":0,"id":"nummer","multiple":false,"parentSelectors":["document"],"regex":"","selector":"h1","type":"SelectorText"}]}

@unique Hi, have you tried using the following selector - button.border-danger + button?

button.border-danger - active button
+ button - the button after it

Example:

{"_id":"trafikverket-puben-2","startUrl":["https://puben.trafikverket.se/dpub/sok"],"selectors":[{"id":"pagination","paginationType":"clickOnce","parentSelectors":["_root","pagination"],"selector":"button.border-danger + button","type":"SelectorPagination"},{"delay":0,"id":"document","multiple":true,"parentSelectors":["pagination"],"selector":".text-medium a","type":"SelectorLink"},{"delay":0,"extractAttribute":"href","id":"nedladdningslänk","parentSelectors":["document"],"selector":"div:nth-of-type(n+10) div:nth-of-type(n+2) a","type":"SelectorGroup"},{"delay":0,"id":"ämnesområde","multiple":false,"parentSelectors":["document"],"regex":"","selector":"div:nth-of-type(4) div","type":"SelectorText"},{"delay":0,"id":"titel","multiple":false,"parentSelectors":["document"],"regex":"","selector":".col > div > div:nth-of-type(1) div","type":"SelectorText"},{"delay":0,"id":"nummer","multiple":false,"parentSelectors":["document"],"regex":"","selector":"h1","type":"SelectorText"}]}

Also, it seems that this website has a very slow loading pace, therefore you should increase the page load delay and request interval values to at least 4000-10'000.

Hope it helps!

1 Like

@ViestursWS Hello, thanks for the reply.

The pagination you suggested works as I had intended but was unable to do myself. Thank you very much for the solution.

May I ask how you created the selector? I can't seem to recreate it myself. I don't really understand the + syntax either. I've only ever created it from just selecting the buttons with the automated feature and have never done anything other than that.

I've also noticed how slow the page loads and usually use a delay of 10 000 ms to make sure everything loads before. Thanks for the heads up.

@unique Hi. Not a problem. To create this selector some knowledge of HTML and CSS is required.

Press the right click on the active button > Inspect.

There you should be able to see the active button - button.border-danger and the '+' sign basically signifies the next button - + button.

See the screenshot:

Learn more:

https://webscraper.io/documentation/css-selector