How to handle umlauts and special characters

yadnesh.haldankar · April 2, 2019, 1:01pm

I'm trying to scrape a German e-commerce website and the resulting csv has weird characters instead of the umlauts e.g. name: Haglöfs in the sheet is name: HaglÃ¶fs.
How do we handle the UTF-8 issues?
Thanks

mihapeople · September 14, 2019, 4:44pm

Hi,

I suggest that my topic related to raised topic.

One more examples- here the web link http://gosjkh.ru/houses/novgorodskaya-oblast/velikij-novgorod
I need to extract tabulated data.
Webscrapper can't handle table head coloumn "№".
What is interesting if I keep this coloumn on - I get message that "Invalid format"
if I tick this coloumn off and keep value empty - I got message "must not be empty"

Please help if possible to handle special character "№"

BR,
Mikhail

leemeng · September 14, 2019, 11:42pm

Hi miha, this is actually the result of another WS limitation and it is not a UTF-8 issue. You can't use a Result key (AKA Header name) which is shorter than 3 characters. I dunno why there is this limit. So if you change № to something longer like номер, it will work.

gutmach · April 10, 2020, 10:22pm

ok, so if this 3-character limit is the problem, why not display "invalid format- result key must be 3 or more chars" instead of the currently vague "invalid format" error message below each result key field? When you are trying to scrape a read-only google sheet, it usually just populates the single column letter as the result key. I was going crazy trying to figure out what was wrong until I found this post.