Pagination with ellipsis

Hey all, I'm playing around with the pagination but can't seem to get it to work where the pages have ellipsis'. Example: " < 1 ... 6 7 8 9 10 ... 25 >"

It'll grab a few details from the first few pages then some from the last page (25).

Tried to get it to work on Walmart and Amazon but running into the same issue on both. Here's the sitemap:

{"_id":"pagination-test","startUrl":["https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?povid=623679+|+2018-04-30+|+CookwareSetsFC"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.search-result-gridview-item a.product-title-link","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"product-title","type":"SelectorText","selector":"div.hide-content-max-m h1.prod-ProductTitle div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"product-price","type":"SelectorText","selector":"span.hide-content span.price","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"product-description","type":"SelectorText","selector":"div.about-desc","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"product-brand","type":"SelectorText","selector":"tr:contains('Brand') div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"product-images","type":"SelectorImage","selector":"img.prod-hero-image-image","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"product-weight","type":"SelectorText","selector":"tr:contains('Assembled Product Weight') div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"product-dimensions","type":"SelectorText","selector":"tr:contains('Assembled Product Dimensions (L x W x H)') div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"pagination","type":"SelectorLink","selector":"ul.paginator-list li:nth-of-type(n+2) a","parentSelectors":["_root"],"multiple":true,"delay":0}]}

Hi there!

I think this post can be suitable for you :slight_smile:

Where would I put the brackets in the sitemap? The start URL won't have anything to do with the pages.

Actually, a quick analysis tells absolutely opposite.

Based on case, you can save your time setting pagination based on brackets within URL instead of setting visual selector.

You can always check if URL change after you click, for example, a second page.

Let's take a look at the URL after clicking page â„–2:

https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?page=2&povid=623679+%7C+2018-04-30+%7C+CookwareSetsFC#searchProductResult

We see that URL has changed. Now we know that if we change the value after page= it will set needed page.
Based on above information we can add pagination brackets [ ], we have to first look how much pages there is, we see that there's just 25 pages of results.

Next, we put a number into the brackets, starting from page one and ending on page 25.
Keep in mind that it works bottom-up, meaning it will go from ending page to starting one.

https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?page=[25-1]&povid=623679+|+2018-04-30+|+CookwareSetsFC#searchProductResult

All you left to do is just replace the URL currently stored within your sitemap with the one above, and try running the scrape. Don't forget to remove pagination selector from your sitemap first, or it will click on numbers while going from page 1 to page 25.

Hope i helped this time.

Hmmm, I tried and it didn't work. The scraper immediately closes.

This is what the sitemap looks like now after changing the URL and removing the pagination selector:

{"_id":"test-walmart","startUrl":["https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?page=[25-1]&povid=623679+|+2018-04-30+|+CookwareSetsFC#searchProductResult"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.search-result-gridview-item a.product-title-link","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"Name","type":"SelectorText","selector":"div.hide-content-max-m h1.prod-ProductTitle div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"Price","type":"SelectorText","selector":"span.hide-content span.price","parentSelectors":["item"],"multiple":false,"regex":"","delay":0}]}

I'm assuming it closes because of the "https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?page=[25-1]&povid=623679+|+2018-04-30+|+CookwareSetsFC#searchProductResult" URL being a 404 and can't select anything?

That's my mistake, seems it won't go 25-1 but only 1-25,

{"_id":"test-walmart2","startUrl":["https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?page=[1-25]&povid=623679+|+2018-04-30+|+CookwareSetsFC%23searchProductResult"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.search-result-gridview-item a.product-title-link","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"Name","type":"SelectorText","selector":"div.hide-content-max-m h1.prod-ProductTitle div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"Price","type":"SelectorText","selector":"span.hide-content span.price","parentSelectors":["item"],"multiple":false,"regex":"","delay":0}]}

Hi there, I was scraping using this code

{"_id":"houzz","startUrl":["https://www.houzz.com/professionals/home-builders/p/330"],"selectors":[{"id":"list","type":"SelectorLink","selector":"a.pro-title","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"compayny name","type":"SelectorText","selector":"a.profile-full-name","parentSelectors":["list"],"multiple":false,"regex":"","delay":0},{"id":"website","type":"SelectorLink","selector":"a.proWebsiteLink","parentSelectors":["list"],"multiple":false,"delay":0},{"id":"pagination","type":"SelectorLink","selector":"ul.pagination li:nth-of-type(n+9) a","parentSelectors":["_root"],"multiple":true,"delay":0}]}

but the web site has no paging status at url. it only show the number 330 at pagination 23 after at page 24 it shows in the url 345. adding the display data number15 at each page.

so can you give me any idea how to solved this problem?

That fixed it, thank you!!

How would you suggest I save the URL of each image in the carousel? I can't seem to find much information on setting up a selector for that.

I'm happy I've helped!

You can extract all images URLs using Element Attribute Selector.
First, select one of the images within carousel, and then keep pressing 'P' button until selection catches parent wrapper, so you can extract attribute from all the images.
Attribute for this selector would be src.

Updated sitemap:

{"_id":"test-walmart2","startUrl":["https://www.walmart.com/browse/cookware-tools/cookware-sets/4044_623679_133020_599265?page=[1-25]&povid=623679+|+2018-04-30+|+CookwareSetsFC#searchProductResult"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.search-result-gridview-item a.product-title-link","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"Name","type":"SelectorText","selector":"div.hide-content-max-m h1.prod-ProductTitle div","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"Price","type":"SelectorText","selector":"span.hide-content span.price","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"pics_links","type":"SelectorElementAttribute","selector":"img.prod-alt-image-carousel-image","parentSelectors":["item"],"multiple":true,"extractAttribute":"src","delay":0}]}

And some of the results:

P.S. there's only one page of cookware sets left available of 25. Maybe it's me but it's all gone or disabled on website.