Images are not being scrapped!

shehan · June 5, 2018, 5:03am

images are not being scrapped??? why

{"_id":"bikroy","startUrl":["https://bikroy.com/en/ads/bangladesh/property"],"selectors":[{"id":"link","type":"SelectorLink","selector":"div.ui-item:nth-of-type(n+5) a.item-title","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"div.item-top h1","parentSelectors":["link"],"multiple":false,"regex":"","delay":0},{"id":"details","type":"SelectorText","selector":"div.col-12 div.row:nth-of-type(2) div.col-12.lg-8","parentSelectors":["link"],"multiple":true,"regex":"","delay":0},{"id":"image","type":"SelectorImage","selector":"div.col-12 div.gallery-item.is-current img","parentSelectors":["link"],"multiple":false,"delay":0}]}

iconoclast · June 5, 2018, 10:24pm

Hi!

Do you want images to be shown as links or download them?

amscraper · June 28, 2019, 6:10am

Open up command line CMD

change directory to your folder with your .csv file of scraped data

cd C:\Users\YOURURLPATH\Desktop\YOURFOLDER\image-downloader-master

Change NAMEOFCSV to the name of your .csv and paste in to CMD and hit enter

python image-downloader.py NAMEOFCSV.csv encoding="ascii" errors="surrogateescape" errors='ignore'

the additional parameters avoids errors with download that stops the script from running.

eric · July 22, 2019, 5:13pm

I'm not sure if it helps. It stops after some downloading and I can see this in command line:

Traceback (most recent call last):
File "image-downloader.py", line 131, in
main(sys.argv)
File "image-downloader.py", line 123, in main
download_csv_file_images(csv_filename)
File "image-downloader.py", line 116, in download_csv_file_images
download_csv_row_images(row, dest_dir)
File "image-downloader.py", line 53, in download_csv_row_images
if key.endswith("-src"):
AttributeError: 'NoneType' object has no attribute 'endswith'

Honestly, I don't know what it means /:

amscraper · August 20, 2019, 11:54pm

the bolded part ignores errors and stops the script from failing

mezkal · May 22, 2020, 10:14am

Hello,
If the file processing stops before the end and you are using MS Excel, you must also check that the CSV file is in UTF-8 format. It is best to use Open Office to manipulate CSV files.

Patrik_Neto · November 13, 2020, 1:25pm

Hello there,

I've came across similiar issue however none of the comments mentioned above/below helped. Could someone kindly help me out? Thank you!

Console output

amscraper · November 13, 2020, 1:40pm

When you call the python script

python image-downloader.py images.csv encoding="ascii" errors="surrogateescape" errors='ignore'

make sure you add this - then it bypasses the errors and continues the download

encoding="ascii" errors="surrogateescape" errors='ignore'

amscraper · November 13, 2020, 1:46pm

Stack Overflow
SAve your CSV as UTF-8

Open it in notepad plus and save as UTF 8

OR

When saving csv from Excel select
Tools

Web Options
Encoding
UTF8

NB: Tools option is a little dropdown arrow next to save

tools.jpg3007×482 106 KB

web-option.jpg1110×977 120 KB

The file in question is not using the CP1252 encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1 and UTF-8 . Since 0x90 doesn't actually mean anything in Latin-1 , UTF-8 (where 0x90 is a continuation byte) is more likely.

You specify the encoding when you open the file:

file = open(filename, encoding="utf8")

wsdc · January 10, 2021, 1:59am

The problem persist, include with http, appear have incompatibility versions with inkscape python

WARNING:Image download error. <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>
Traceback (most recent call last):
File "image-downloader.py", line 131, in
main(sys.argv)
File "image-downloader.py", line 123, in main
download_csv_file_images(csv_filename)
File "image-downloader.py", line 115, in download_csv_file_images
for row in csvreader:
File "C:\Program Files\Inkscape\lib\python3.8\csv.py", line 111, in next
row = next(self.reader)
File "C:\Program Files\Inkscape\lib\python3.8\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 510: character maps to

ScrapeSite · February 25, 2021, 3:36am

Can anyone shared a sample csv test data

ScrapeSite · April 17, 2021, 7:33am

Ohh.. got it..
mine is extracted with element attribute using meta so i renamed different.
After renaming to image-src
It's downloading...

Images are not being scrapped!

NB: Tools option is a little dropdown arrow next to save tools.jpg3007×482 106 KB web-option.jpg1110×977 120 KB

NB: Tools option is a little dropdown arrow next to save

tools.jpg3007×482 106 KB

web-option.jpg1110×977 120 KB