Scrape data from paginated detail page?

I am having troubles scraping this website.
I seem to be able to follow the pagination, but the detailed link request is only selecting the first entry of each page.
As well, I am unable to collect data from the detailed link.

The goal is to collect all the data from the detailed location page for each location on each list page.

What am I doing wrong?


{"_id":"west_kelowna-playgrounds","startUrl":["[1-3]"],"selectors":[{"id":"playground-links","type":"SelectorElementClick","parentSelectors":["playground-selector"],"selector":"h4","multiple":true,"delay":2000,"clickElementSelector":"h4","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"name","type":"SelectorText","parentSelectors":["playground-links"],"selector":"h2","multiple":false,"regex":"","delay":0},{"id":"address","type":"SelectorText","parentSelectors":["playground-links"],"selector":".sidebar-feed p:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"hours","type":"SelectorText","parentSelectors":["playground-links"],"selector":".sidebar-feed li","multiple":false,"regex":"","delay":0},{"id":"description","type":"SelectorText","parentSelectors":["playground-links"],"selector":"p:nth-of-type(3)","multiple":false,"regex":"","delay":0},{"id":"photo","type":"SelectorImage","parentSelectors":["playground-links"],"selector":"img.photoItem","multiple":false,"delay":0},{"id":"playground-selector","type":"SelectorElement","parentSelectors":["_root"],"selector":"a.sidebar-item","multiple":true,"delay":0},{"id":"","type":"SelectorText","parentSelectors":["playground-links"],"selector":"","multiple":false,"regex":"","delay":0}]}

Hi, @techhouse After checking out this website, it seems that there are no valid links that would lead to the different pages, they seem to be embedded into javascript. If you look at the data preview there are no unique links. Only the links which lead to the main page, but if you click on the link you can see that the URL changes so there's no possibility to use element-click selector here as well.

Thanks, @viesturs .
I guess that is why I only get one result from each page.
Is there another method to make this scrape work?

It is possible to scrape this site in two stages, where in stage 1, you get all the "data-value" tags in each row and create URLs, and then in stage 2 you have a different sitemap which uses all those stage 1 URLs as Starturls.

For the stage 1 scrape, you can use this scraper (along with your paginator):

Type: Element attribute
Selector: ul > li div.l-item-container a:first-of-type
Multiple Yes (checked)
Attribute name: data-value

This will yield a bunch of "data-value" which look like:


You will then need to prefix the site URL so that they become:

These would be the direct links to detail pages, and you can use them as multiple URLs in a new sitemap.

I have posted about adding suffix/prefix to URLs here:

1 Like

Thank you, for this clever solution.
Using your method, I created the links with a spreadsheet, then cleaned-up the format in notepad.