Images are not being scrapped!

images are not being scrapped??? why

{"_id":"bikroy","startUrl":["https://bikroy.com/en/ads/bangladesh/property"],"selectors":[{"id":"link","type":"SelectorLink","selector":"div.ui-item:nth-of-type(n+5) a.item-title","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"name","type":"SelectorText","selector":"div.item-top h1","parentSelectors":["link"],"multiple":false,"regex":"","delay":0},{"id":"details","type":"SelectorText","selector":"div.col-12 div.row:nth-of-type(2) div.col-12.lg-8","parentSelectors":["link"],"multiple":true,"regex":"","delay":0},{"id":"image","type":"SelectorImage","selector":"div.col-12 div.gallery-item.is-current img","parentSelectors":["link"],"multiple":false,"delay":0}]}

Hi!

Do you want images to be shown as links or download them?

Open up command line CMD

change directory to your folder with your .csv file of scraped data

cd C:\Users\YOURURLPATH\Desktop\YOURFOLDER\image-downloader-master

Change NAMEOFCSV to the name of your .csv and paste in to CMD and hit enter

python image-downloader.py NAMEOFCSV.csv encoding="ascii" errors="surrogateescape" errors='ignore'

the additional parameters avoids errors with download that stops the script from running.

I'm not sure if it helps. It stops after some downloading and I can see this in command line:

Traceback (most recent call last):
File "image-downloader.py", line 131, in
main(sys.argv)
File "image-downloader.py", line 123, in main
download_csv_file_images(csv_filename)
File "image-downloader.py", line 116, in download_csv_file_images
download_csv_row_images(row, dest_dir)
File "image-downloader.py", line 53, in download_csv_row_images
if key.endswith("-src"):
AttributeError: 'NoneType' object has no attribute 'endswith'

Honestly, I don't know what it means /:

the bolded part ignores errors and stops the script from failing

Hello,
If the file processing stops before the end and you are using MS Excel, you must also check that the CSV file is in UTF-8 format. It is best to use Open Office to manipulate CSV files.

Hello there,

I've came across similiar issue however none of the comments mentioned above/below helped. Could someone kindly help me out? Thank you!

Console output

C:\Users\Patrik\Downloads\image-downloader-0.1.1>python image-downloader.py test.csv encoding"ascii" errors="surrogateescape" errors='ignore'
INFO:executed by python 3
INFO:importing data from test.csv
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-superman-kabelka-cerna-vbs2u803-001-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-superman-kabelka-cerna-vbs2u803-001-32.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-superman-kabelka-cerna-vbs2u803-001-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-carillon-kabelka-pres-rameno-cervena-vbs3ma02-003-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-carillon-kabelka-pres-rameno-cervena-vbs3ma02-003-32.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-carillon-kabelka-pres-rameno-cervena-vbs3ma02-003-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-carillon-kabelka-pres-rameno-cerna-vbs3ma02-001-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-carillon-kabelka-pres-rameno-cerna-vbs3ma02-001-32.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-carillon-kabelka-pres-rameno-cerna-vbs3ma02-001-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-superman-kabelka-pres-rameno-cerna-vbs2u807-001-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-superman-kabelka-pres-rameno-cerna-vbs2u807-001-32.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-superman-kabelka-pres-rameno-cerna-vbs2u807-001-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-falcor-kabelka-pres-rameno-oranzova-vbs3tp03-048-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-falcor-kabelka-pres-rameno-oranzova-vbs3tp03-048-32.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-falcor-kabelka-pres-rameno-oranzova-vbs3tp03-048-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-audrey-kabelka-pres-rameno-tmavomodra-vbs3n103c-002-31.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-audrey-kabelka-pres-rameno-tmavomodra-vbs3n103c-002-32.jpg
INFO:downloading image https://www.wardow.com/images/1000x/valentino-by-mario-valentino-audrey-kabelka-pres-rameno-tmavomodra-vbs3n103c-002-31.jpg
Traceback (most recent call last):
File "C:\Users\Patrik\Downloads\image-downloader-0.1.1\image-downloader.py", line 131, in
main(sys.argv)
File "C:\Users\Patrik\Downloads\image-downloader-0.1.1\image-downloader.py", line 123, in main
download_csv_file_images(csv_filename)
File "C:\Users\Patrik\Downloads\image-downloader-0.1.1\image-downloader.py", line 115, in download_csv_file_images
for row in csvreader:
File "C:\Users\Patrik\AppData\Local\Programs\Python\Python39\lib\csv.py", line 111, in next
row = next(self.reader)
File "C:\Users\Patrik\AppData\Local\Programs\Python\Python39\lib\encodings\cp1250.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x88 in position 4374: character maps to

When you call the python script

python image-downloader.py images.csv encoding="ascii" errors="surrogateescape" errors='ignore'

make sure you add this - then it bypasses the errors and continues the download

encoding="ascii" errors="surrogateescape" errors='ignore'

1 Like

Stack Overflow
SAve your CSV as UTF-8

Open it in notepad plus and save as UTF 8

OR

When saving csv from Excel select
Tools

Web Options
Encoding
UTF8

NB: Tools option is a little dropdown arrow next to save

The file in question is not using the CP1252 encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1 and UTF-8 . Since 0x90 doesn't actually mean anything in Latin-1 , UTF-8 (where 0x90 is a continuation byte) is more likely.

You specify the encoding when you open the file:

file = open(filename, encoding="utf8")
2 Likes

The problem persist, include with http, appear have incompatibility versions with inkscape python

WARNING:Image download error. <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>
Traceback (most recent call last):
File "image-downloader.py", line 131, in
main(sys.argv)
File "image-downloader.py", line 123, in main
download_csv_file_images(csv_filename)
File "image-downloader.py", line 115, in download_csv_file_images
for row in csvreader:
File "C:\Program Files\Inkscape\lib\python3.8\csv.py", line 111, in next
row = next(self.reader)
File "C:\Program Files\Inkscape\lib\python3.8\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 510: character maps to

Can anyone shared a sample csv test data

Ohh.. got it..
mine is extracted with element attribute using meta so i renamed different.
After renaming to image-src
It's downloading...