Newbie - Pagination Scrape for Product Pages in Search

Describe the problem.

Totally new to using this type of Data Extraction tool so please be kind if this is a stupid question! I'm carrying out a search on the below website for products that contain "WKD" in the product detail page. I get 780 results. I want to scrape some basic text from those 780 product pages. I'm able to get the scrape for the first page of results but not beyond. I've watched the tutorial until my eyes are bleeding and i can't understand what I'm doing wrong with the pagination? I've tried it several ways but each time I get just the first page of results and nothing beyond? Anyone know what's wrong, I'd really appreciate some guidance here! Thanks.

Url: https://www.xs-stock.co.uk/a/search?type=product%2Carticle%2Cpage%2Ccollection&q=WKD

Sitemap:
{"_id":"allxsstock","startUrl":["https://www.xs-stock.co.uk/a/search?type=product%2Carticle%2Cpage%2Ccollection&q=WKD"],"selectors":[{"id":"pagination","paginationType":"auto","parentSelectors":["_root","pagination"],"selector":".pagination > span","type":"SelectorPagination"},{"delay":0,"id":"productclick","multiple":true,"parentSelectors":["pagination"],"selector":"a.grid-product__link","type":"SelectorLink"},{"delay":0,"id":"sku","multiple":false,"parentSelectors":["productclick"],"regex":"","selector":"p.product-single__sku","type":"SelectorText"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["productclick"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"rrp","multiple":false,"parentSelectors":["productclick"],"regex":"","selector":"span.product__price","type":"SelectorText"},{"delay":0,"id":"description","multiple":false,"parentSelectors":["productclick"],"regex":"","selector":"div.product-single__description-full","type":"SelectorText"},{"delay":0,"id":"imageurls","multiple":false,"parentSelectors":["productclick"],"selector":".starting-slide img","type":"SelectorImage"}]}

leeanoona
Hi you are wrong in using the new "Pagination" function

Just put "nex" as the button.
You don't have to select all the links

Next time select only the one as in the picture, (the arrow)

Then select all the links with the data you need

This sitemap works

{"_id":"stock","startUrl":["https://www.xs-stock.co.uk/a/search?type=product%2Carticle%2Cpage%2Ccollection&q=WKD"],"selectors":[{"delay":0,"id":"sku","multiple":false,"parentSelectors":["Link product"],"regex":"","selector":"p.product-single__sku","type":"SelectorText"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["Link product"],"regex":"","selector":"h1","type":"SelectorText"},{"delay":0,"id":"rrp","multiple":false,"parentSelectors":["Link product"],"regex":"","selector":"span.product__price","type":"SelectorText"},{"delay":0,"id":"description","multiple":false,"parentSelectors":["Link product"],"regex":"","selector":"div.product-single__description-full","type":"SelectorText"},{"delay":0,"id":"imageurls","multiple":false,"parentSelectors":["Link product"],"selector":".starting-slide img","type":"SelectorImage"},{"id":"Pagination","paginationType":"auto","parentSelectors":["_root","Pagination"],"selector":".next a","type":"SelectorPagination"},{"delay":0,"id":"Link product","multiple":true,"parentSelectors":["Pagination"],"selector":"a.grid-product__link","type":"SelectorLink"}]}

1 Like

Hi. Thank you so very much. This now works! Thanks again. I will make sure I update the notes for this so I don't make that mistake again!

Hi Angelo,

I am having a similar issue (also new to this program), in which i am trying to load a zillow page with specific filters, scroll to the bottom of the page so that all data loads, then cycle through the other pages. I tried to use the method described above for leeanoona's issue but it doesn't seem to work. Does anyone have any suggestions?

sitemap:

{"_id":"zillow_ca_imported3","startUrl":["https://www.zillow.com/ca/?searchQueryState={"usersSearchTerm"%3A"CA"%2C"mapBounds"%3A{"west"%3A-126.09619878125%2C"east"%3A-112.51709721875%2C"south"%3A31.387333500722523%2C"north"%3A43.001371100838305}%2C"regionSelection"%3A[{"regionId"%3A9%2C"regionType"%3A2}]%2C"isMapVisible"%3Atrue%2C"filterState"%3A{"price"%3A{"min"%3A100000%2C"max"%3A25000000}%2C"mp"%3A{"min"%3A333%2C"max"%3A83182}%2C"built"%3A{"min"%3A2020}%2C"sort"%3A{"value"%3A"globalrelevanceex"}%2C"ah"%3A{"value"%3Atrue}%2C"con"%3A{"value"%3Afalse}%2C"mf"%3A{"value"%3Afalse}%2C"manu"%3A{"value"%3Afalse}%2C"land"%3A{"value"%3Afalse}%2C"apa"%3A{"value"%3Afalse}%2C"apco"%3A{"value"%3Afalse}}%2C"isListVisible"%3Atrue%2C"mapZoom"%3A6}"],"selectors":[{"delay":2500,"id":"Separate scroller","multiple":true,"parentSelectors":["_root","Pagination"],"scrollElementSelector":"div.search-page-list-container","selector":"div#grid-search-results > ul > li:nth-of-type(2n+3)","type":"SelectorElementScroll"},{"delay":0,"id":"Item wrappers","multiple":true,"parentSelectors":["_root","Pagination"],"selector":"div#grid-search-results > ul > li","type":"SelectorElement"},{"delay":0,"id":"Price","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"div.list-card-price","type":"SelectorText"},{"delay":0,"id":"Details","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"ul.list-card-details","type":"SelectorText"},{"delay":0,"extractAttribute":"href","id":"Link","multiple":false,"parentSelectors":["Item wrappers"],"selector":"a.list-card-link","type":"SelectorElementAttribute"},{"delay":0,"id":"Beds","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"li:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"Baths","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"li:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Full Address","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"address","type":"SelectorText"},{"delay":0,"id":"Status","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"li.list-card-statusText","type":"SelectorText"},{"id":"Pagination","paginationType":"auto","parentSelectors":["_root","Pagination"],"selector":".PaginationJumpItem-c11n-8-37-0__sc-18wdg2l-0 a.cyhUbV","type":"SelectorPagination"}]}

Thanks!

@treed Hello, it doesn't seem like the JSON of your sitemap is valid, therefore, when pasting your sitemap here, please, apply the 'Preformatted text' or make sure it is copyable.

F.Y.I a template sitemap for zillow.com is available in the community sitemaps section within Web Scraper Cloud - Web Scraper

Sitemap example:

{"_id":"zillow-com","startUrl":["https://www.zillow.com/ca/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-179.999%2C%22east%22%3A-20.91696874999999%2C%22south%22%3A27.589587053547294%2C%22north%22%3A66.71536820880881%7D%2C%22mapZoom%22%3A4%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A9%2C%22regionType%22%3A2%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22price%22%3A%7B%22min%22%3A100000%2C%22max%22%3A25000000%7D%2C%22built%22%3A%7B%22min%22%3A2020%7D%2C%22con%22%3A%7B%22value%22%3Afalse%7D%2C%22apa%22%3A%7B%22value%22%3Afalse%7D%2C%22mf%22%3A%7B%22value%22%3Afalse%7D%2C%22mp%22%3A%7B%22min%22%3A333%2C%22max%22%3A83182%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%2C%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22land%22%3A%7B%22value%22%3Afalse%7D%2C%22manu%22%3A%7B%22value%22%3Afalse%7D%2C%22apco%22%3A%7B%22value%22%3Afalse%7D%7D%2C%22isListVisible%22%3Atrue%7D"],"selectors":[{"id":"listing-pagination","paginationType":"clickMore","parentSelectors":["_root","listing-pagination"],"selector":"a[title=\"Next page\"]","type":"SelectorPagination"},{"delay":3000,"id":"listing-element-scroll","multiple":true,"parentSelectors":["listing-pagination"],"scrollElementSelector":"div.search-page-list-container","selector":"article","type":"SelectorElementScroll"},{"delay":0,"id":"listing-link","multiple":true,"parentSelectors":["listing-element-scroll"],"selector":".list-card-info a.list-card-link[href*='/homedetails/']","type":"SelectorLink"},{"delay":0,"id":"home-listing-page","multiple":true,"parentSelectors":["listing-link"],"selector":"html","type":"SelectorElement"},{"delay":0,"extractAttribute":"href","id":"listing-url","multiple":false,"parentSelectors":["home-listing-page"],"selector":"link[rel=\"canonical\"]","type":"SelectorElementAttribute"},{"delay":0,"id":"address","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"#ds-chip-property-address","type":"SelectorText"},{"delay":0,"id":"price","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":".ds-summary-row > span:contains('$')","type":"SelectorText"},{"delay":0,"id":"zillow-rent-estimate","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"button:contains('Rent Zestimate') + span","type":"SelectorText"},{"delay":0,"id":"bedrooms","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":".ds-bed-bath-living-area-container > span:contains('bd') > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"bathrooms","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":".ds-bed-bath-living-area-container > span:contains('ba') > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"area-sqft","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":".ds-bed-bath-living-area-container > span:contains('sqft') > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"date-available","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Date available:') + span","type":"SelectorText"},{"delay":0,"id":"type","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Type:') + span","type":"SelectorText"},{"delay":0,"id":"year-built","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has(> h4:contains('facts and features')) li:has(span:contains('Year built:'))","type":"SelectorText"},{"delay":0,"id":"cooling","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Cooling:') + span","type":"SelectorText"},{"delay":0,"id":"heating","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Heating:') + span","type":"SelectorText"},{"delay":0,"id":"pets","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Pets:') + span","type":"SelectorText"},{"delay":0,"id":"parking","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Parking:') + span","type":"SelectorText"},{"delay":0,"id":"laundry","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Laundry:') + span","type":"SelectorText"},{"delay":0,"id":"deposit-and-fees","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"span:contains('Deposit & fees:') + span","type":"SelectorText"},{"delay":0,"id":"overview-text","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":".ds-overview-section > div, .sc-pQQXS, .gDpqEw","type":"SelectorText"},{"delay":0,"id":"zipcode","multiple":false,"parentSelectors":["home-listing-page"],"regex":"\\d{5}","selector":"#ds-chip-property-address span:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"neighborhood","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":".ds-neighborhood h4, .ds-neighborhood > p","type":"SelectorText"},{"delay":0,"id":"neighborhood-walk-score","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"[aria-describedby='walk-score-text'] span","type":"SelectorText"},{"delay":0,"id":"neighborhood-transit-score","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"[aria-describedby='transit-score-text'] span","type":"SelectorText"},{"delay":0,"id":"interior-details","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has( > h5:contains('Interior details')), div:has( > p:contains('Interior details'))","type":"SelectorHTML"},{"delay":0,"id":"property-details","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has( > h5:contains('Property details')), div:has( > p:contains('Property details'))","type":"SelectorHTML"},{"delay":0,"id":"construction-details","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has( > h5:contains('Construction details')), div:has( > p:contains('Construction details'))","type":"SelectorHTML"},{"delay":0,"id":"utilities-details","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has( > h5:contains('Utilities / Green Energy Details')), div:has( > p:contains('Utilities / Green Energy Details'))","type":"SelectorHTML"},{"delay":0,"id":"hoa-and-financial-details","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has( > h5:contains('HOA and financial details')), div:has( > p:contains('HOA and financial details'))","type":"SelectorHTML"},{"delay":0,"id":"other-details","multiple":false,"parentSelectors":["home-listing-page"],"regex":"","selector":"div:has( > h5:contains('Other')), div:has( > p.hUuINN:contains('Other'))","type":"SelectorHTML"},{"delay":0,"id":"image-1","multiple":false,"parentSelectors":["home-listing-page"],"selector":"li.media-stream-tile img[src*='zillowstatic.com']:nth(0)","type":"SelectorImage"},{"delay":0,"id":"image-2","multiple":false,"parentSelectors":["home-listing-page"],"selector":"li.media-stream-tile img[src*='zillowstatic.com']:nth(1)","type":"SelectorImage"},{"delay":0,"id":"image-3","multiple":false,"parentSelectors":["home-listing-page"],"selector":"li.media-stream-tile img[src*='zillowstatic.com']:nth(2)","type":"SelectorImage"},{"delay":0,"id":"image-4","multiple":false,"parentSelectors":["home-listing-page"],"selector":"li.media-stream-tile img[src*='zillowstatic.com']:nth(3)","type":"SelectorImage"},{"delay":0,"id":"image-5","multiple":false,"parentSelectors":["home-listing-page"],"selector":"li.media-stream-tile img[src*='zillowstatic.com']:nth(4)","type":"SelectorImage"},{"delay":1500,"id":"images-scroll","multiple":false,"parentSelectors":["home-listing-page"],"scrollElementSelector":"div.ds-media-col","selector":"ul.media-stream","type":"SelectorElementScroll"},{"delay":0,"extractAttribute":"src","id":"images-grouped","parentSelectors":["images-scroll"],"selector":"li.media-stream-tile img[src*='zillowstatic.com']","type":"SelectorGroup"}]}

{"_id":"zillow_ca_imported3","startUrl":["https://www.zillow.com/ca/?searchQueryState=%7B%22usersSearchTerm%22%3A%22CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-126.09619878125%2C%22east%22%3A-112.51709721875%2C%22south%22%3A31.387333500722523%2C%22north%22%3A43.001371100838305%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A9%2C%22regionType%22%3A2%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22price%22%3A%7B%22min%22%3A100000%2C%22max%22%3A25000000%7D%2C%22mp%22%3A%7B%22min%22%3A333%2C%22max%22%3A83182%7D%2C%22built%22%3A%7B%22min%22%3A2020%7D%2C%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%2C%22con%22%3A%7B%22value%22%3Afalse%7D%2C%22mf%22%3A%7B%22value%22%3Afalse%7D%2C%22manu%22%3A%7B%22value%22%3Afalse%7D%2C%22land%22%3A%7B%22value%22%3Afalse%7D%2C%22apa%22%3A%7B%22value%22%3Afalse%7D%2C%22apco%22%3A%7B%22value%22%3Afalse%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A6%7D"],"selectors":[{"delay":2500,"id":"Separate scroller","multiple":true,"parentSelectors":["_root","Pagination"],"scrollElementSelector":"div.search-page-list-container","selector":"div#grid-search-results > ul > li:nth-of-type(2n+3)","type":"SelectorElementScroll"},{"delay":0,"id":"Item wrappers","multiple":true,"parentSelectors":["_root","Pagination"],"selector":"div#grid-search-results > ul > li","type":"SelectorElement"},{"delay":0,"id":"Price","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"div.list-card-price","type":"SelectorText"},{"delay":0,"id":"Details","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"ul.list-card-details","type":"SelectorText"},{"delay":0,"extractAttribute":"href","id":"Link","multiple":false,"parentSelectors":["Item wrappers"],"selector":"a.list-card-link","type":"SelectorElementAttribute"},{"delay":0,"id":"Beds","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"li:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"Baths","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"li:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"Full Address","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"address","type":"SelectorText"},{"delay":0,"id":"Status","multiple":false,"parentSelectors":["Item wrappers"],"regex":"","selector":"li.list-card-statusText","type":"SelectorText"},{"id":"Pagination","paginationType":"auto","parentSelectors":["_root","Pagination"],"selector":".PaginationJumpItem-c11n-8-37-0__sc-18wdg2l-0 a.cyhUbV","type":"SelectorPagination"}]}

Thanks @ViestursWS. I pasted my sitemap as directed above. I just ran the sitemap you sent, and while it does scroll through all of the necessary pages it did not seem to grab all of the data. I'm going to play around with it now and see what i can figure out. Thanks for your time!