Max Results before you crash

Hi @iconoclast and @jeremyrem @KristapsWS

I had a friend ask me if webscraper could scrape 500MM results and it got be thinking.. At what point are you likely to crash your browser, what would be the max records (tested or assumed) Does this number change if you're using coughDB?

Curious if anyone has tested the limits.

Let's forget that excel has a limit to how many rows it can have and I'm not sure what csv limits might be..

Hi Bret,

do you mean 50 000 000 (50 million) of records?

It's most likely will depend on max file size rather than number of records.

Less than ~25-30mb of data would work, more than that most likely crash your browser.

Besides, since version 0.3.7 / 0.3.8 there is a limit of 1000 URLs per sitemap.

Hmm Interesting and CouchDB doesn't impact that?

What about the cloud scrape, does that allow for hire sizes?

Its not like its storing all of the results in memory if you dont use couchdb, its still in a db file.

If set to scrape slow enough, and you have enough space as long as the browser doesnt crash it should scrape them all.

Exporting on the other hand is another issue.

Excel has a limit, but there are other programs that do not.

One of my favorite ones to use when dealing with large csv's like the NPIDB (5GB) is delimit.

You could also import it into an sql server and wirk with it that way but I imagine that is the same as using couchdb but with more steps.