Two questions:
- Is it possible to scrape the images directly while scraping the text from the page? Looking at the response headers for one of the images on the page, I see "Cache-Control: public,max-age=31536000,immutable". It seems silly to have to run a separate script outside of the browser to scrape all the images, which would generate new requests to the server being scraped.
- Can we make the filename of the image downloaded be unique to some field in the data row it's in. For the example of eBay, it makes sense to name the image 1234567890.webp if the item id in ebay is 1234567890. Ideally, I'll write a custom script for my specific use case that takes the downloaded images and processes them based on text data from that row, so having a unique reference field will be critical.
I searched and didn't immediately find any posts covering this. If I need to move this to Feature Requests, that's OK, too, but I'd love to get this going soon, so if there's some creative work-around, I should be able to write the code or do what I need to do to make it work.
Url: http://ebay.com
Sitemap:
{id:"sitemap code"}