Pagination issue (wont go any deeper than category pages)

Hi

I followed the Pagination tutorial video and set up a pagination link which works but now my item selector is being ignored so the scrapper just goes to each "next" category page and won't go any deeper into the site and scrap what I had already set up.

The selector graph looks right so I have no idea why it is isn't working as expected.

Can anyone help, please?

I have exported the scraper below:

{"_id":"houzz_uk","startUrl":["https://www.houzz.co.uk/professionals/interior-designers/s/Interior-Designers/c/london/d/100"],"selectors":[{"id":"item","type":"SelectorLink","selector":"a.pro-title","parentSelectors":["_root","next_page"],"multiple":true,"delay":0},{"id":"website URL","type":"SelectorLink","selector":"a.proWebsiteLink","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"Category Title","type":"SelectorText","selector":"div.info-list-text span:nth-of-type(2) span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"Location","type":"SelectorText","selector":"div.info-list-label:nth-of-type(2) div.info-list-text","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"next_page","type":"SelectorLink","selector":"a.navigation-button","parentSelectors":["_root","next_page"],"multiple":true,"delay":0},{"id":"Name","type":"SelectorText","selector":"a.profile-full-name","parentSelectors":["item"],"multiple":false,"regex":"","delay":0}]}

anyone got any ideas? I really need to get this working and can't figure out what is wrong.

thanks

Change your "next_page" selector to ul.pagination li a . Scraper will only scrape unvisited pages.

Hi

Thanks for replying.

How can I make it vist pages it has already visted?

HI,

also, I applied that adjustment but it is still just viewing the pagnatgion pages and skipping the items in the list. even ones that it has never opened before.

Hi again,... I changed the next_page selector to not be multi and it seems to skip a few pagnation pages then start scraping as expect but then stops and closes down on its own accord after around 30 scraps.

what is happening here? I don't understand why is is behaving so randomly.

I ran it again and it did the same thing and was scrapping already visted pages from the past scape.

the export is below, and help is really appreciated.

{"_id":"houzz_uk","startUrl":["https://www.houzz.co.uk/professionals/interior-designers/s/Interior-Designers/c/london/d/100"],"selectors":[{"id":"item","type":"SelectorLink","selector":"a.pro-title","parentSelectors":["_root","next_page"],"multiple":true,"delay":0},{"id":"website URL","type":"SelectorLink","selector":"a.proWebsiteLink","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"Category Title","type":"SelectorText","selector":"div.info-list-text span:nth-of-type(2) span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"Location","type":"SelectorText","selector":"div.info-list-label:nth-of-type(2) div.info-list-text","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"next_page","type":"SelectorLink","selector":"ul.pagination li a","parentSelectors":["_root","next_page"],"multiple":false,"delay":0},{"id":"Name","type":"SelectorText","selector":"a.profile-full-name","parentSelectors":["item"],"multiple":false,"regex":"","delay":0}]}

"next_page" selector has to be multiple. The problem is that there is javascript in item links that redirects you to item page. You can see it by clicking data preview on "item" selector. Web scraper currently is not able to navigate through this kind of links. The only solution here would be to scrape from script tag item URLs but it would require to post process them to import them in separate sitemap that scrapes only item pages. You can scrape them by making text selector like this: script[type="application/ld+json"] .

Hi
I dont understand? there seems to be no issue with the next button? I can skip through all the pages fine.

The issue is, the scrapper isn't performing the "item" taks on the paganation pages.

I don't see where the java would be an issue? all the links used are html

I am using the below now and it pagenates find but just wont scrap anything even those the selector graph path implys it should.

{"_id":"houzz_uk","startUrl":["https://www.houzz.co.uk/professionals/interior-designers/s/Interior-Designers/c/london/d/100"],"selectors":[{"id":"item","type":"SelectorLink","selector":"a.pro-title","parentSelectors":["_root","next_page"],"multiple":true,"delay":0},{"id":"website URL","type":"SelectorLink","selector":"a.proWebsiteLink","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"Category Title","type":"SelectorText","selector":"div.info-list-text span:nth-of-type(2) span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"Location","type":"SelectorText","selector":"div.info-list-label:nth-of-type(2) div.info-list-text","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"next_page","type":"SelectorLink","selector":".pagination-wrapper .navigation-button.next","parentSelectors":["_root","next_page"],"multiple":true,"delay":0},{"id":"Name","type":"SelectorText","selector":"a.profile-full-name","parentSelectors":["item"],"multiple":false,"regex":"","delay":0}]}

The issue is surely linked to the next page action being processed before the other "item" selector, meaning nothing is scarped?

I have spent hours trying to get this to work, about to give up.

Pagination is working fine. Scraper will navigate through each page first and then it will start scraping the item pages. If you want scraper to start with item pages you have to change the sequence of selectors: pagination first and item selector beneath it.

You won't be able to scrape all item pages because of javascript. Only ~50% of a elements have URLs in href attribute, others have javascript. You can test that by deleting pagination and then running the sitemap, it will return data only from few pages that are in first listing page.