Sitemap xml selector

Is it possible to add .xml.gz files somehow ? Thanks !

Hi,

Do you have an example link?

Sure:
https://www.g2.com/sitemaps/sitemap_index.xml.gz
https://www.producthunt.com/sitemaps_v3/product_reviews_sitemap.xml.gz

The scraper will extract the links from an .xml.gz file as long as it does not exceed the maximum allowed 25MB per file.

It worked with https://www.producthunt.com/sitemaps_v3/product_reviews_sitemap.xml.gz, however, https://www.g2.com/sitemaps/sitemap_index.xml.gz contains sub-files that exceed the limit.

image

Great, thanks !
And is there some kind of workaround solution for the bigger files?

You could try adding a Found Url Regex to narrow down the results, for instance, /de/. This would only return links containing /de/