Scraping this site: Pagination and it does not pull all links in the page

Hi I am scraping this site and there 50 listings per page but when I scrape it, only 20 appears.

Secondly, how do I get pagination as it has the ...> at the end

Thanks

https://shopee.sg/search?keyword=Hair%20Scalp%20Massager&page=0&sortBy=relevancy

{"_id":"shopee","startUrl":["https://shopee.sg/search?keyword=Hair%20Scalp%20Massager&page=0&sortBy=relevancy"],"selectors":[{"id":"element","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.search-page","multiple":false,"delay":0},{"id":"scrolldown","type":"SelectorElementScroll","parentSelectors":["element"],"selector":"div.shopee-search-item-result","multiple":false,"delay":"1000"},{"id":"link","type":"SelectorLink","parentSelectors":["scrolldown"],"selector":"a","multiple":true,"delay":"1000"},{"id":"title","type":"SelectorText","parentSelectors":["link"],"selector":"div.qaNIZv","multiple":false,"regex":"","delay":0},{"id":"soldmonthly","type":"SelectorText","parentSelectors":["link"],"selector":"div._22sp0A","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["link"],"selector":"div._3n5NQx","multiple":false,"regex":"","delay":0},{"id":"shipping","type":"SelectorText","parentSelectors":["link"],"selector":"div._29vRDg div.shopee-drawer div.flex","multiple":false,"regex":"","delay":0}]}

Check this out. See if you can see what I changed and why it works. Happy to answer your questions

{"_id":"shopee","startUrl":["https://shopee.sg/search?keyword=Hair%20Scalp%20Massager&page=0&sortBy=relevancy"],"selectors":[{"id":"element","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.col-xs-2-4","multiple":true,"delay":0,"clickElementSelector":"button.shopee-icon-button.shopee-icon-button--right","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"link","type":"SelectorLink","parentSelectors":["element"],"selector":"a","multiple":false,"delay":""},{"id":"title","type":"SelectorText","parentSelectors":["link"],"selector":"div.qaNIZv","multiple":false,"regex":"","delay":0},{"id":"soldmonthly","type":"SelectorText","parentSelectors":["link"],"selector":"div._22sp0A","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["link"],"selector":"div._3n5NQx","multiple":false,"regex":"","delay":0},{"id":"shipping","type":"SelectorText","parentSelectors":["link"],"selector":"div._3djNyJ","multiple":false,"regex":"","delay":0}]}
1 Like

That is very interesting.

You use the Element Click to create all the elements while click on the right button.

I have never got used to it before on how to use it. Are there any good tutorials out there for this?

After the Element click was used, and Multiple was clicked, it created separate elements for each listing till there are no more pages.

After that a link was used to find all the links to extract text from.

Wow what went through this thought process to solve this web scrape.

By the way, not sure if this is the correct place to ask. If I have some web scraping tasks that needs to be created, do you or anyone do it for a fee to create the webscrape code?

Got 1 problem though. Tried running it but there are a total of 23 pages with 50 listings per page.

Total 1150 listings in all but when I ran it, it stopped at 270 listings, what could be causing this problem?

Solved that issue. Increased the delay to allow the scraper to click all the way to the end.

I don’t know about throrials but I learned by going through this forum and reverse engineering some of the solutions that were provided.

You use element click selector when the url doesn’t change during pagination. If the url changes, link selector made a child onto its self (recursive) works.

In terms of building you a sitemap, post what you need and someone will do it for free. If it’s an intense build, there are many sites that will do it for a cost, including this one. Maybe email them directly. You can also check out the parsehubs and mozendas of the world.

For some reason even after increasing the delay timing, I am still only able to extract 350 to 370 listings for 24 pages of 50 listings per page. Anyone has any idea why?

This is my sitemap.

{"_id":"shopeetest","startUrl":["https://shopee.sg/search?keyword=Hair%20Scalp%20Massager&page=0&sortBy=relevancy"],"selectors":[{"id":"element","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.col-xs-2-4","multiple":true,"delay":"1000","clickElementSelector":"button.shopee-button-outline.shopee-mini-page-controller__next-btn","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"link","type":"SelectorLink","parentSelectors":["element"],"selector":"a","multiple":false,"delay":"200"},{"id":"title","type":"SelectorText","parentSelectors":["link"],"selector":"div.qaNIZv","multiple":false,"regex":"","delay":0},{"id":"soldmonthly","type":"SelectorText","parentSelectors":["link"],"selector":"div._22sp0A","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["link"],"selector":"div._3n5NQx","multiple":false,"regex":"","delay":0},{"id":"shipping","type":"SelectorText","parentSelectors":["link"],"selector":"div._39oJVk div.flex","multiple":false,"regex":"","delay":0},{"id":"ratings","type":"SelectorText","parentSelectors":["link"],"selector":"div.flex-auto div.flex div.flex:nth-of-type(2) div._3Oj5_n","multiple":false,"regex":"","delay":0}]}

i'm hitting a wall here too. 347 is the max I can get and I tried a few different ways

That is so weird not sure why. I am also wondering what happened. Anyone else can shed a light to this?

Hi all
i found out the reason why only 1/3 or so of the actual data is being pulled.

On each page there are 10 rows with 5 items on each row.

For some reason, only the first 3 rows of data are extracted from each page.

I think it has to do with Shopee having to scroll to the bottom of the page to load the rest of the data.

Can anyone help on this?

I have a solution to this check out my site map

However I ran into a speedbump again.

I used element scroll to scroll to the bottom of the page, then use element click selection to click on the next page button

however, as the next page button is clicked, how can I get it to scroll to the bottom of the page before clicking on the next page button again?

{"_id":"shopeetest","startUrl":["https://shopee.sg/search?keyword=hair%20massager%20professional"],"selectors":[{"id":"element","type":"SelectorElementClick","parentSelectors":["scrolldown"],"selector":"div.col-xs-2-4","multiple":true,"delay":"3000","clickElementSelector":"button.shopee-icon-button.shopee-icon-button--right","clickType":"clickMore","discardInitialElements":false,"clickElementUniquenessType":"uniqueCSSSelector"},{"id":"link","type":"SelectorLink","parentSelectors":["element"],"selector":"a","multiple":false,"delay":"1000"},{"id":"title","type":"SelectorText","parentSelectors":["link"],"selector":"div.qaNIZv","multiple":false,"regex":"","delay":0},{"id":"soldmonthly","type":"SelectorText","parentSelectors":["link"],"selector":"div._22sp0A","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":["link"],"selector":"div._3n5NQx","multiple":false,"regex":"","delay":0},{"id":"shipping","type":"SelectorText","parentSelectors":["link"],"selector":"div._39oJVk div.flex","multiple":false,"regex":"","delay":0},{"id":"ratings","type":"SelectorText","parentSelectors":["link"],"selector":"div.flex-auto div.flex div.flex:nth-of-type(2) div._3Oj5_n","multiple":false,"regex":"","delay":0},{"id":"scrolldown","type":"SelectorElementScroll","parentSelectors":["_root"],"selector":"div.search-page","multiple":false,"delay":"1000"}]}