How to Go About Large Scale Scraping


I am a complete noob to programming and computers and whatnot (literally: I know nothing, I just found this website since it seems to provide the service I'm looking for, and am barely figuring it out). I'm looking to scrape a large amount of pages of a certain website (about 439,000 pages). Does Webscraper offer special tools that can do that or do I use regular pagination?

attach website address and a test sitemap of yours

  1. Create sitemap for one page
  2. Export sitemap
  3. Get or generate all url's
  4. Split url's, depending numbers of selectors in 5000 or 10000
  5. Remplace url of sitemap by splitted list, use this for add ","
    Scraping from list of urls - #4 by iconoclast
  6. Import new sitemap and run.

You can start various sesions or machines...