Does exporting data before a scrape has finished interrupt the scraper?

Hi there,

I'm scraping a large amount of information. My scraper has been running 24 hours a day, for nearly two weeks. I have exported the data twice during this time to check how the scrape is going. Both times, this didn't seem to be a problem - the scraper continued to work; the second CSV was bigger than the first (as expected).

Today I exported the data for a third time. This time, the number of entries in the CSV was the same as when I exported it a week ago (19,597 rows), which implies the scraper hasn't collected any data since the last time I exported the CSV. However, the scraper still appears to be working in the pop-up window.

Is this a problem? Should my 3rd CSV have more entries than my 2nd CSV, if the scraper is working? I am expecting a final CSV of ~76,000 rows. Is this too much information to scrape?

Thank you in advance!

Hey I ran into this as well. It isn't too much info I've done 500,000 rows. If you export as CSV the scrape will still run but won't pull data if that makes sense just be patient. If you're worried about speed of exporting into CSV bc of the large volume copy your sitemap and split the scrape into fourths and run them on separate tabs to split it up. Since the scrape holds data separate per each chrome instance aka tab you can run them at the same time and they won't interfere

Phew. Glad to hear it's not too much info. I was scared I'd have to start the whole thing again! And re: running the scraper in separate tabs - that's a great tip. Thank you very much.

Hi again,

I tried to split up my sitemap and run the scrape in separate tabs (while still running the original enormous scrape) but I can't seem to launch the new scrapes. When I hit "Start scraping", I immediately get a "Scraping finished!" message telling me the new scrape is done. (The original scrape is still going though.)

Any suggestions? I get the same result both in a separate tab and a separate window.

What I really want to do is speed things up. Do you know if I can change the page delay on the original scrape without interrupting the scraper? Or are there other ways to speed up a scrape that is already in progress?

Thanks in advance

Hey,
Your issue is probably that you aren't editing the metadata. For example if your scrape url is a range of like [1-100000] for example, then to run 4 tabs you need 4 separate sitemaps with different ranges of [1-25000], [25001-50000] etc. For each tab

Hey, thanks for the reply! I should have said - I did set different ranges, just as you describe. There’s some other problem but I have no idea what.

Do your new scrapers for the tabs with different ranges have different sitemap names? B/c if not that could be your problem

Ah thanks - I had changed the sitemap names too, so it wasn't that. I worked out that it was because I put the ranges in reverse order (which I'd read elsewhere you could do to run the scraper in the opposite order). It worked when I put the ranges back in ascending order.
thanks for your help!