Scraping Amazon product images

Continuing the discussion from Scrape all amazon products images?:

I can scrape most of the data I want from an Amazon category of products, but I can only get the link to the first/main image in good quality. I'm trying to get the rest of the links of the additional images from the thumbnails, but although I see the scraper show them, the links are not shown when I browse the results and they aren't saved on the CSV either.

Sitemap:

{"id":"portatiles","startUrl":["https://www.amazon.es/informatica/b/?ie=UTF8&node=667049031&ref=topnav_storetab_mega_sv_pc"],"selectors":[{"id":"categoria","type":"SelectorLink","parentSelectors":["root"],"selector":"a[title='Portátiles']","multiple":false,"delay":0},{"id":"producto","type":"SelectorElement","parentSelectors":["categoria","paginado"],"selector":".celwidget div.s-item-container","multiple":true,"delay":0},{"id":"paginado","type":"SelectorLink","parentSelectors":["categoria","paginado"],"selector":"a.pagnNext","multiple":true,"delay":0},{"id":"prodsingle","type":"SelectorLink","parentSelectors":["producto"],"selector":"a.s-access-detail-page","multiple":false,"delay":0},{"id":"precio","type":"SelectorText","parentSelectors":["producto"],"selector":"span.a-size-base","multiple":false,"regex":"","delay":0},{"id":"nombre","type":"SelectorText","parentSelectors":["producto"],"selector":"h2","multiple":false,"regex":"","delay":0},{"id":"imagen-principal","type":"SelectorImage","parentSelectors":["prodsingle"],"selector":"img.a-stretch-horizontal","multiple":false,"delay":0},{"id":"showall","type":"SelectorElementClick","parentSelectors":["prodsingle"],"selector":"li.image div.imgTagWrapper","multiple":true,"delay":"500","clickElementSelector":"li.a-spacing-small","clickType":"clickMore","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueCSSSelector"}]}

I wonder what's the problem here, I've tried to use that code that seemed to work in that previous forum thread.

This will click all images in a product pages and get the img url. Modify as needed. I used Page load delay:5500.

{"_id":"amazon-es-get-images","startUrl":["https://www.amazon.es/Elite-8300-Ordenador-Reacondicionado-Certificado/dp/B0792TQ4XS","https://www.amazon.es/Logitech-MK270-teclado-inal%C3%A1mbrico-Windows/dp/B00CHHDY66"],"selectors":[{"id":"product title","type":"SelectorText","parentSelectors":["_root"],"selector":"span#productTitle","multiple":false,"regex":"","delay":0},{"id":"Click all images","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div#main-image-container ul > li[class*='selected']","multiple":true,"delay":"800","clickElementSelector":"ul> li[class*='imageThumbnail'][data-ux-click]","clickType":"clickOnce","discardInitialElements":"discard","clickElementUniquenessType":"uniqueHTMLText"},{"id":"Image Url","type":"SelectorElementAttribute","parentSelectors":["Click all images"],"selector":"span > div > img","multiple":false,"extractAttribute":"src","delay":0}]}

1 Like

That works! The only problem is that with my sitemap I had every parameter from one product in one row with several colums, and now each image generates a new row for the same product :confused: I wonder if the CSV could be generated with all the additional images in the same product row as new colums, not as new rows.

Solved :slight_smile:

{"id":"portatiles-definitivo-pipi","startUrl":["https://www.amazon.es/informatica/b/?ie=UTF8&node=667049031&ref=topnav_storetab_mega_sv_pc"],"selectors":[{"id":"categoria","type":"SelectorLink","parentSelectors":["_root"],"selector":"a[title='Portátiles']","multiple":false,"delay":0},{"id":"producto","type":"SelectorElement","parentSelectors":["categoria","paginado"],"selector":".celwidget div.s-item-container","multiple":true,"delay":0},{"id":"paginado","type":"SelectorLink","parentSelectors":["categoria","paginado"],"selector":"a.pagnNext","multiple":true,"delay":0},{"id":"prodsingle","type":"SelectorLink","parentSelectors":["producto"],"selector":"a.s-access-detail-page","multiple":false,"delay":0},{"id":"precio","type":"SelectorText","parentSelectors":["producto"],"selector":"span.a-size-base","multiple":false,"regex":"","delay":0},{"id":"nombre","type":"SelectorText","parentSelectors":["producto"],"selector":"h2","multiple":false,"regex":"","delay":0},{"id":"Click all images","type":"SelectorElementClick","parentSelectors":["prodsingle"],"selector":"div#main-image-container ul > li[class*='selected']","multiple":true,"delay":"800","clickElementSelector":"ul> li[class*='imageThumbnail'][data-ux-click]","clickType":"clickOnce","discardInitialElements":"discard","clickElementUniquenessType":"uniqueHTMLText"},{"id":"Image 1","type":"SelectorElementAttribute","parentSelectors":["prodsingle"],"selector":"div#main-image-container ul > li.itemNo0 span > div > img","multiple":false,"extractAttribute":"src","delay":0},{"id":"Image 2","type":"SelectorElementAttribute","parentSelectors":["prodsingle"],"selector":"div#main-image-container ul > li.itemNo1 span > div > img","multiple":false,"extractAttribute":"src","delay":0},{"id":"Image 3","type":"SelectorElementAttribute","parentSelectors":["prodsingle"],"selector":"div#main-image-container ul > li.itemNo2 span > div > img","multiple":false,"extractAttribute":"src","delay":0},{"id":"Image 4","type":"SelectorElementAttribute","parentSelectors":["prodsingle"],"selector":"div#main-image-container ul > li.itemNo3 span > div > img","multiple":false,"extractAttribute":"src","delay":0},{"id":"Image 5","type":"SelectorElementAttribute","parentSelectors":["prodsingle"],"selector":"div#main-image-container ul > li.itemNo4 span > div > img","multiple":false,"extractAttribute":"src","delay":0}]}

Ya that's probably how I would do it :+1:t3:

1 Like

Hello, could you please help me or indicate which is the correct option in each box so that I can extract all the images of that product? I have not been able to extract all the images, when I do they lower me in miniature and they do not serve me. Thank you very much for your help.

https://www.amazon.com/-/es/Computadora-portátil-pantalla-retroiluminado-A515-43-R19L/dp/B07RF1XD36/ref=sr_1_1?dchild=1&qid=1614653708&refinements=p_85%3A2470955011%2Cp_n_condition-type%3A2224371011&rnid=2224369011&rps=1&s=pc&sr=1-1

Hi @tiendarelojventas
The idea, in this case, is to extract all of the small image links using grouped selector and later on use a parser to edit the links. Selector defines the image itself and the attribute "src" will extract the links. After your press the element preview, switch to the elements tab, press ctrl + f, and enter "ws-data" you should find the line of code you have selected.
Screenshot from 2021-03-02 09-48-17|690x374

Hello, I do not know what I am doing wrong, the images do not extract, could you help me please? , I am trying to download the information of several Amazon products. (Name, link, ASIN, Description, all images)

https://www.amazon.com/s?k=apple+watch&i=electronics&rh=n%3A10048700011%2Cp_85%3A2470955011%2Cp_n_condition-type%3A2224371011%2Cp_89%3AApple&dc&language=es&__�85_US= % BD% C3% 95% C3% 91 & crid = 25NDSPIP1D7Y9 & qid = 1614739167 & rnid = 2528832011 & sprefix = apple + watch% 2Caps% 2C251 & ref = sr_nr_p_89_1

The only thing that I have not been able to extract are all the images, I would appreciate your help.

The hires URLs are actually stored elsewhere in the source, and you can get at the whole block with something like:

Type: HTML
Selector: script[type*='javascript']:contains('colorImages')

You'd need to post-process that javascript tho.

Sir please add Title, About, Price, All Images Links, Brand etc for .com site and send me sitemap please.