Scrape all full-sized Magento images in the fotorama gallery

Has anyone managed to scrape full-sized Magento images, comprehensively for each product?

  1. only 3 full size images are loaded at a time - one in view and two hidden flanking on either side, so if you want to use the grouped selector, you can only scrape the 3 images initially loaded.
  2. I have tried with mixed results to use the click element selector to cycle through the image gallery and then scrape the image on the fotorama stage. The problem is there's no way to collect the images as the image carousel is turned. I would ideally like to have a grouped attributes function which works in conjunction with click element to collect attribute data from all the clicks into one single json array.

{"_id":"agequipment2","startUrl":["https://www.agequipment.com.au/"],"selectors":[{"delay":0,"id":"top_category_links","multiple":true,"parentSelectors":["_root"],"selector":"#maincontent > div > div > div > div > div.column.main > div.home-page-1 > div.kg-shop-thousand > div > div > div.kg-shop-block > div > a:nth-child(1)","type":"SelectorLink"},{"delay":0,"id":"product_links","multiple":true,"parentSelectors":["top_category_links","next_page"],"selector":"#amasty-shopby-product-list > div.products.wrapper.grid.products-grid > ol > li > div > div.product.details.product-item-details > strong > a","type":"SelectorLink"},{"delay":0,"id":"title","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":"#maincontent > div > div > div > div > div.column.main > div.product-info-main > div.page-title-wrapper.product > h1 > span","type":"SelectorText"},{"delay":0,"id":"mpn","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":"#maincontent > div > div > div > div > div.column.main > div.product-info-main > div.product-info-price > div.product-info-stock-sku > div.product.attribute.sku > div","type":"SelectorText"},{"delay":0,"id":"rrp_price_inc","multiple":false,"parentSelectors":["product_links"],"regex":"(?<=\\$).*","selector":"div.price-box span[data-price-type=\"finalPrice\"] span","type":"SelectorText"},{"delay":0,"id":"stock","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":"p.availability > span","type":"SelectorText"},{"delay":0,"id":"desc","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":".description","type":"SelectorHTML"},{"delay":0,"extractAttribute":"href","id":"manual","multiple":true,"parentSelectors":["product_links"],"selector":"#amfile_attachment > div > div > a","type":"SelectorElementAttribute"},{"delay":0,"id":"breadcrumb_categories","multiple":false,"parentSelectors":["product_links"],"regex":"(?<=<\\/li>\\s)[\\s\\S]*$","selector":"body > div.page-wrapper > div.breadcrumbs > div > ul","type":"SelectorHTML"},{"delay":0,"id":"stock_commentary","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":"#maincontent > div > div > div > div > div.column.main > div.product-info-main > div.product-info-price > div.kg_product-info-stock-status","type":"SelectorText"},{"delay":0,"id":"next_page","multiple":false,"parentSelectors":["top_category_links"],"selector":"#amasty-shopby-product-list > div:nth-child(3) > div.pages > ul > li.item.pages-item-next > a","type":"SelectorLink"},{"delay":0,"id":"rrp_price_ex","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":"div.price-box span[data-price-type=\"basePrice\"] span","type":"SelectorText"},{"clickElementSelector":".fotorama__thumb","clickElementUniquenessType":"uniqueHTMLText","clickType":"clickOnce","delay":1000,"discardInitialElements":"do-not-discard","id":"img_carousel","multiple":true,"parentSelectors":["product_links"],"selector":".fotorama__active.fotorama__loaded--img","type":"SelectorElementClick"},{"delay":0,"extractAttribute":"src","id":"full_img","multiple":true,"parentSelectors":["img_carousel"],"selector":"img","type":"SelectorElementAttribute"},{"delay":0,"extractAttribute":"src","id":"grouped_src_3_loaded_only","parentSelectors":["product_links"],"selector":".fotorama__stage__frame .fotorama__img","type":"SelectorGroup"},{"delay":0,"id":"ETA_label","multiple":false,"parentSelectors":["product_links"],"regex":"","selector":".amasty-label-text","type":"SelectorText"},{"id":"pagination_gallery","paginationType":"auto","parentSelectors":["product_links","pagination_gallery"],"selector":".fotorama__nav__frame--thumb","type":"SelectorPagination"},{"delay":0,"extractAttribute":"src","id":"div_href_img","multiple":false,"parentSelectors":["pagination_gallery"],"selector":"#magnifier-item-0","type":"SelectorElementAttribute"}]}

I have included both approaches in this sitemap.
Please, if you have any ideas how I can scrape all of the full-sized images, let me know how.
Note that the thumbnail images are not what I want. I want the full sized images, which can go up to 1600px large. There is also a medium size image 600px large, which I could settle for, but I would prefer to scrape the full-size 1600px image src if possible.
If you find the above complicated, just start from the ground up. You see if you can figure out a way to do it, and let me know how you did it.

Try this:

{"_id":"agequipment-demo","startUrl":["https://www.agequipment.com.au/5-tray-blast-chiller/"],"selectors":[{"id":"Title","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1 span.base","type":"SelectorText"},{"clickElementSelector":"div.fotorama__stage__frame","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","delay":3500,"discardInitialElements":"do-not-discard","id":"Click to load carousel separately","multiple":false,"parentSelectors":["_root"],"selector":"div.fotorama__stage__frame","type":"SelectorElementClick"},{"clickElementSelector":"div.fotorama__nav__shaft > div.fotorama__nav__frame--thumb","clickElementUniquenessType":"uniqueHTML","clickType":"clickOnce","delay":2100,"discardInitialElements":"discard-when-click-element-exists","id":"Click carousel thumbs","multiple":true,"parentSelectors":["_root"],"selector":"div.fotorama__stage__shaft > div.fotorama__loaded:first-of-type","type":"SelectorElementClick"},{"id":"Image","multiple":false,"parentSelectors":["Click carousel thumbs"],"selector":"img.fotorama__img--full","type":"SelectorImage"}]}

1 Like

We're half way there...
It does work insofar as collecting the images, albeit on separate lines/rows, rather than as a grouped data array - that is grouping the image links into one data cell - which is what I was ultimately after.