Issue
According to the documentation, the following command
'Brückenspinne'.encode("utf-8",errors='replace')
should give me the byte sequenceb'Br??ckenspinne'
. However, unicode characters are not replaced but encoded nevertheless:
b'Br\xc3\xbcckenspinne'
Can you tell me how I actually eliminate Unicode characters? (I use replace for testing purposes, I intend to use 'xmlcharrefreplace'
later. To be totally honest, I want to convert the unicode characters to their xmlcharref, keeping everything as a string).
Solution
utf-8
encoding can represent the character ü
; there is no error, so no replacement occurs.
To see errors='replace'
in action, use another encoding that cannot represent the character. For example ascii
:
>>> 'Brückenspinne'.encode("ascii", errors='replace')
b'Br?ckenspinne'
>>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
b'Brückenspinne'
Answered By - falsetru
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.