Unable to export to Excel / CSV files are empty

Web Scraper version: 1.29.66
Chrome version: Versione 116.0.5845.188 (64 bit)
OS: Windows 10

Sitemap:
{"_id":"scheda-cliente","startUrl":["https://privatewebsite/"],"selectors":[{"id":"Cliente Codice","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='1332:0']","type":"SelectorText"},{"id":"Provincia","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='34:554;a']","type":"SelectorText"},{"id":"Telefono","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".slds-truncate span.forceOutputPhone","type":"SelectorText"},{"id":"Cellulare","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='63:554;a']","type":"SelectorText"},{"id":"Finanziabile","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".slds-form-element__static span.lvm-grid-no-fade-out","type":"SelectorText"},{"id":"Note","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".slds-form-element_edit span.uiOutputTextArea","type":"SelectorText"},{"id":"Cognome","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='293:246;a']","type":"SelectorText"},{"id":"CF/PIVA","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='398:246;a']","type":"SelectorText"},{"id":"Ultimo Ordine","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='377:246;a']","type":"SelectorText"},{"id":"Ultima Rata","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='415:246;a']","type":"SelectorText"},{"id":"Email","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='522:246;a']","type":"SelectorText"},{"columns":[{"extract":true,"header":"Codice Opera","name":"Codice Opera"},{"extract":true,"header":"Nome Prodotto","name":"Nome Prodotto"},{"extract":true,"header":"Data Firma","name":"Data Firma"},{"extract":true,"header":"Qtà","name":"Qtà"},{"extract":true,"header":"N.Rate","name":"NRate"},{"extract":true,"header":"Codice Cliente","name":"Codice Cliente"},{"extract":true,"header":"Cliente","name":"Cliente"},{"extract":true,"header":"Numero Ordine","name":"Numero Ordine"}],"id":"Lista Opere","multiple":true,"parentSelectors":["_root"],"selector":".cT_ListaOpereAccount table","tableDataRowSelector":".slds-table--bordered tr","tableHeaderRowSelector":"tr.slds-text-title--caps","type":"SelectorTable"},{"id":"Via","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[data-aura-rendered-by='675:246;a']","type":"SelectorText"},{"id":"Cap / Città","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[data-aura-rendered-by='677:246;a']","type":"SelectorText"},{"id":"Ultima Telefonata","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='852:246;a']","type":"SelectorText"},{"id":"Data Ultima Telefonata","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='847:251;a']","type":"SelectorText"},{"id":"Ultimo Appuntamento","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='869:246;a']","type":"SelectorText"},{"id":"Data Ultimo Appuntamento","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='856:251;a']","type":"SelectorText"},{"id":"Esito","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='911:246;a']","type":"SelectorText"},{"id":"Funzionario","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"span[data-aura-rendered-by='948:246;a']","type":"SelectorText"},{"id":"Operatore","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[data-aura-rendered-by='1450:246;a']","type":"SelectorText"},{"id":"Ultima Modifica","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div[data-aura-rendered-by='1502:246;a']","type":"SelectorText"}]}
Error Message:

Exported Excel / CSV are empty

i can see data on preview mode clicking the "data preview" button but unable to export them!
also how i can add exported data to an existing file? instead of saving a single file for every exported page? ...it's possible to add this option? like "append to existing file" or having a button to click for faster export that can grab data on the page i'm viewing and appen to an existing file?
Thanks
1 Like

Yes because is a private site Salesforce and you need to be logged So I've removed the uri
The problem is that the excel file (or csv file) are created but only with header columns but without data... but on screen i see that they are scraped sucesfully so i can cut&paste on excel for example but is tedious task to do it manually...

I hope the "append to file" feauture will be created so i can export faster without creating a file for every page that's absurd... aving 500.000 pages to extract

I've try with chrome and firefox same issue... seems data don't pass some kind of block but I'm logged as admin user on that pc...

On another pc it work fine ... but I need to use it on that pc so I can debug this bug if someone can guide me on how to do...

Hard to diagnose without site access but I'm guessing your selectors are specific to one page only 'cos they seem to contain a lot of random numbers, e.g.

span[data-aura-rendered-by='852:246;a']
span[data-aura-rendered-by='847:251;a']

These numbers would probably change with every new page, so your scraper won't work.

1 Like

Exactly Ieemeng I suppose this is a security measure of Salesforce CRM (https://www.salesforce.com)

...so basically I'm unable to grab data since selectors change every time...
(I've asked on another post if there is another way to select data...)

Hi there,
I have the same issue, I've let the extension work for about 30 hours to scrape over 100k pages on a site. Then I can now preview the data with no problems but the .XSLX and .CSV export buttons doesn't seem to do anything.

There is no error in the console and nothing seems to happen in the background tasks (I was thinking that it only would take some time like for the data preview)....

This is very strange because prior to last Firefox versions, it was working like a charm.

I'm using WebScraper Extension 1.72.9 on Firefox 122.0.1.

1 Like

Thanks for confirming it... So I'm not mad hehe... :crazy_face:

@larsen Hello, that most likely happens because it takes a longer time to load huge datasets. Can you replicate this issue on smaller datasets?

For extracting data from such a huge amount of pages we would recommend using our Cloud solution: https://cloud.webscraper.io/

If the scrape has been conducted locally(in the extension) the issue you described can also be affected by your location, browser, or computer performance.

I've had this happen after some long scraping jobs (50K+ lines, more than 24 hours). I'm guessing it is caused by browser memory leaks which will slow down the computer. You might be able to fix this by just restarting your computer (i.e. don't try to export immediately after a scrape).

I am currently experiencing, the exact same issue with similar conditions. :smiling_face_with_tear:

Hi, thank you for your quick answer !

Yes, I've tried on smaller datasets and I have the very same issue.

Sadly I can't see any log error that would tell me where to look more precisely.

Try to srape another web site to check if a problem persists...

I've tried on 6 different sites so far, all facing the same issue. It seems that it is not a site issue

@tadao Are you using the browser extension version for Chrome, Mozilla, or Edge? What is the approximate amount of records you are trying to download?

I have 3 PCs set up:

  • 2 of which uses the chrome extension version
  • 1 is using the firefox extension version

Each were trying to scrape around:

  • 2500 rows,
  • 10000 rows, and
  • 30000 rows respectively.

Each is equipped with 16gb of ram but varied cpu gens.

I have tried the "restarting the PC before downloading the result" to no avail :frowning:

@tadao Understood. Are you perhaps using any other extensions or running any parallel processes in the background?

If you could provide the sitemap, I will run the tests on my end to see whether the same issue can be replicated.

I have found a new discovery for the bug!

Clicking the button to export data (either xlsx or csv) required waiting from 15 minutes up to over an hour before actually downloading the file. No indication of loading / processing is visible, it just behaves this way.

  • Unfortunately, in the case of large data amounts (above 1000 rows), even after downloading the exported data, the xlsx file appears to be corrupted.
  • Fortunately, the csv file appears to be okay.

In the case of the xlsx file, when attempting to open the exported data, a pop up message appears:

It states that excel found a problem with the content and will attempt to repair. Upon clicking "yes", another pop up appears:

image

Excel then shows an empty workbook.

I tested the issue through multiple scenarios and have come to the following conclusions:

  • The long duration to download appears on all scenarios. It appeared on all 3 of my PCs and in every sitemap regardless of amount of scraped data. Lower data count is faster but even at 200 rows, it required 15 minutes at the fastest.
  • It's unlikely a hardware bottleneck. The PCs CPU is Ryzen 5 3600 desktops (and equivalents) with 16gb of ram and SSD. PC-1 was scarping while used to browse, PC-2 was scraping while running a lite python script, PC-3 dedicated to only scraping.
  • It's definitely not a site specific issue. I have tested for 6 different sites all of which behaving the same.
  • Large amount of scraped data (above 1000 rows) yield a corrupted xlsx file. The corrupted excel file varies in size and proportional to the expected scraped data amount (2mb, 10mb, 50mb, etc). However when opened, all xlsx files shows an empty workbook with the pop ups mentioned prior.
  • Despite suffering from the long duration before downloading, the csv file is safe and useable.

Here is an example of a large dataset sitemap that I have. I apologize for the lengthy detailed reply. I hope to provide the most that I can to help solve the issue and for others that may face similar issue. I deeply appreciate your efforts in helping. In the meantime, I am very grateful that atleast the csv file format is working :smiling_face_with_tear:

@tadao Hi, as of now we are not able to replicate the issue you have described.

Have you tried to reinstall the extension and run the extraction process on a freshly installed extension? We would appreciate it if you could provide us with a recording.

!IMPORTANT
Please note, that uninstalling will also remove any previously created sitemaps, so any sitemaps that you wish to keep should be backed up by either copying them from the extension's "Export Sitemap" view to a file on your computer or by uploading them to your Web Scraper Cloud account using the sitemap sync functionality (available after connecting your account using the "Sign in to Cloud" option in the top right of the extension panel): Sitemap sync | Web Scraper Documentation

Please, make sure to follow the steps described in the following post: How to submit a video bug report - #2

I had the same issue of nothing happening when I click export to XLS or CSV but your suggestion to wait 15+ minutes worked! Thanks for the work around

30,000 rows of data
Brave browser