UNRELIABLE results

bp22 · May 17, 2023, 2:28pm

I realize that there are many variables that can impact the success of webscraper and that web scraping in general can be highly unreliable, however this tool almost never produces consistent and useful data for me even when the same sitemap is used on the same site over and over. I'm using the browser extension for Firefox. What am I missing? Is it some sort of unpredictable issue on the site I'm trying to scrape. For example, the scrape chugs along fine collecting data as expected then decides to stop all of a sudden on a certain page (inconsistently). I"m using pagination and it will go through pages and pages and pages of data as expected then it just stops randomly on say page 11 of 85...next time I run it may stop on page x or y. It works fine for 100's of pages then just stops...I don't really understand what I am doing wrong as no useful error information is produced. Essentially every time I run the same sitemap on the same site (which is mostly static) it produces different data. It will randomly miss data from a page unpredictably and repeatably. Is this just how web scraping goes?

ViestursWS · May 18, 2023, 6:14am

@3HAT0K @bp22 Hello. Could you, please, provide the sitemap you are referring to?

Please, note that if the scrape has been conducted locally(in the extension) there are a number of factors we can not affect, such as - your location, browser version, OS version, network performance, and other factors.

As for the reference, please, be sure to test this via Web Scraper Cloud as well: Web Scraper Cloud | Web Scraper Documentation

Scraping jobs in Web Scraper Cloud are launched within a virtual environment(based on a server) and are not dependent on your own machine in any aspect.