When connecting to CouchDB webscraper is issuing a DELETE request to purge the database. I am running the same scrape multiple times as the website data is changing, and would like to have an option for keeping the previous scrape results around in CouchDB.
I find the current solution fairly brittle since clicking on "scrape" will effectively nuke the previous data.
I would suggest either
- Create a new DB on each scrape, using a unique id (timestamp?) in compination with sitemap name as database name
- Add an option to the settings page for disabling issuing the DELETE request when starting a scrape.
Happy to hear your thoughts!
Thanks!