Scraping catalog data with pagination - Fails after 1st page

Issue:
I'm currently trying to scrape the site and while I can easily get the first page of content, pagination is not working. I have tried using the link type and element link type. Neither is working. Could I get an extra set of eyes to help provide insight to what I might be missing? Appreciate any help y'all can offer me.

URL:
https://www.jamberry.com/us/en/shop/shop/for/search?pageSize=24

Sitemap:
{"_id":"jamberry-nailwraps","startUrl":["https://www.jamberry.com/us/en/shop/shop/for/search?pageSize=24"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.tile > a","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorLink","selector":"ul.results li:nth-of-type(n+4) a","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"item-title","type":"SelectorText","selector":"h1","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","selector":"div.price span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-details","type":"SelectorText","selector":"div.lead span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-hashtag","type":"SelectorText","selector":"div.hashtag","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-tags","type":"SelectorGroup","selector":"a.label","parentSelectors":["item"],"delay":0,"extractAttribute":""},{"id":"item-image-primary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"item-image-secondary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0}]}

I made a couple more tweaks and managed to get it to give me the first two pages. Still can't get the third page or any of the other pages after that.

{"_id":"jamberry-nailwraps","startUrl":["https://www.jamberry.com/us/en/shop/shop/for/search?pageSize=24"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.tile > a","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorElementClick","selector":"ul.results li:nth-of-type(n+3) a","parentSelectors":["_root","item"],"multiple":true,"delay":0,"clickElementSelector":"ul.results li:nth-of-type(6) a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"item-title","type":"SelectorText","selector":"h1","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","selector":"div.price span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-details","type":"SelectorText","selector":"div.lead span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-hashtag","type":"SelectorText","selector":"div.hashtag","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-tags","type":"SelectorGroup","selector":"a.label","parentSelectors":["item"],"delay":0,"extractAttribute":""},{"id":"item-image-primary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"item-image-secondary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0}]}

After another few attempts, I somehow managed to grab the first page and the last. What am I doing wrong??

{"_id":"jamberry-nailwraps","startUrl":["https://www.jamberry.com/us/en/shop/shop/for/search?pageSize=24"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.tile > a","parentSelectors":["_root","pagination"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorElementClick","selector":"ul.results li:nth-of-type(n+4) a","parentSelectors":["_root","item"],"multiple":true,"delay":0,"clickElementSelector":"ul.results li:nth-of-type(5) a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"item-title","type":"SelectorText","selector":"h1","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","selector":"div.price span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-details","type":"SelectorText","selector":"div.lead span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-hashtag","type":"SelectorText","selector":"div.hashtag","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-tags","type":"SelectorGroup","selector":"a.label","parentSelectors":["item"],"delay":0,"extractAttribute":""},{"id":"item-image-primary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"item-image-secondary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0}]}

Hi!

There's two options available for this exact website -- either Element Click or pagination with brackets [1-...]

If you want to use Element click, which is more tricky than using brackets, you should select a wrapper that includes pagination buttons, beside selecting items. Then you can select Next button to be clicked until it becomes disabled (cause all items been shown).

Here's an example of pagination using Element click:

{"_id":"jamberry-nailwraps","startUrl":["https://www.jamberry.com/us/en/shop/shop/for/search?pageSize=24"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.tile > a","parentSelectors":["pagination"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorElementClick","selector":"section.catalog-grid","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"ul.results li:nth-of-type(6) a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"item-title","type":"SelectorText","selector":"h1","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","selector":"div.price span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-details","type":"SelectorText","selector":"div.lead span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-hashtag","type":"SelectorText","selector":"div.hashtag","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-tags","type":"SelectorGroup","selector":"a.label","parentSelectors":["item"],"delay":0,"extractAttribute":""},{"id":"item-image-primary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0},{"id":"item-image-secondary","type":"SelectorImage","selector":"div.ms-slide.ms-sl-selected img","parentSelectors":["item"],"multiple":false,"delay":0}]}

Now let's look into pagination brackets.
It's useless if a website URL does not contain page number information in it, in all other cases it's much easier to use it instead of Next button clicker.
The website you provided DOES contain page number, once you click page 2, you can see it in URL.
Also it has a number of items to show, max 96 per page, all you've left to analyze is number of pages to scrape (it's 7).

Here's how your sitemap URL would look if you use pagination brackets:

https://www.jamberry.com/us/en/shop/shop/for/search?p=[1-7]&pageSize=96

Hope I've helped.

1 Like

So with this method you don't need to make the pagination a child unto its self?

1 Like

First of all you cannot make any selector as a parent/child to itself, it's a bug present in current version (0.3.7) that allows you to link a selector parent to itself -- it will cause recursion and will not work. It will be fixed in next version.

I'd still prefer using pagination brackets for time-saving and easiness of use as long as website URL has page number in it.

You can read more about picking a right pagination selector here:

1 Like

iconoclast..... you are rockstar! I had been trying to get it to work way too long. I made it a child as that is what the pagination video said to do at http://webscraper.io/tutorials. Perhaps I interpreted it incorrectly.

Needless to say, I can see the logic in your example.

I did realize that I also messed up on grabbing the image links on the within the item selector. I thought I needed to create to selectors in order to grab the images. I removed both and created a simple selector to just grab both the links. Code below is what I used and its now scraping like a charm. Much appreciated for all the help.

{"_id":"jamberry-leech","startUrl":["https://www.jamberry.com/us/en/shop/shop/for/search?pageSize=24"],"selectors":[{"id":"item","type":"SelectorLink","selector":"div.tile > a","parentSelectors":["pagination"],"multiple":true,"delay":0},{"id":"pagination","type":"SelectorElementClick","selector":"section.catalog-grid","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"ul.results li:nth-of-type(6) a","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"item-title","type":"SelectorText","selector":"h1","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-price","type":"SelectorText","selector":"div.price span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-details","type":"SelectorText","selector":"div.lead span","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-hashtag","type":"SelectorText","selector":"div.hashtag","parentSelectors":["item"],"multiple":false,"regex":"","delay":0},{"id":"item-tags","type":"SelectorGroup","selector":"a.label","parentSelectors":["item"],"delay":0,"extractAttribute":""},{"id":"item-images","type":"SelectorImage","selector":"div.ms-slide img","parentSelectors":["item"],"multiple":true,"delay":0}]}

1 Like

Actually you can make link or element selector as a parent selector for itself but it needs to have another parent selector otherwise you won't be able to access it anymore.

1 Like

I tried your code but received "No data scraped yet" message. Could you please confirm? Maybe the website's tructure was changed? Thank you.