Save historical scrapes

fberge · August 28, 2018, 9:27pm

When connecting to CouchDB webscraper is issuing a DELETE request to purge the database. I am running the same scrape multiple times as the website data is changing, and would like to have an option for keeping the previous scrape results around in CouchDB.

I find the current solution fairly brittle since clicking on "scrape" will effectively nuke the previous data.

I would suggest either

Create a new DB on each scrape, using a unique id (timestamp?) in compination with sitemap name as database name
Add an option to the settings page for disabling issuing the DELETE request when starting a scrape.

Happy to hear your thoughts!

Thanks!