Issue
I had a script I was trying to port from python2 to python3.
I did read through the porting documentation, https://docs.python.org/3/howto/pyporting.html.
My original python2 script used open('filename.txt)
. In porting to python3 I updated it to io.open('filename.txt')
. Now when running the script as python2 or python3 with the same input files I get some errors like UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 793: invalid start byte
.
Does python2 open
have less strict error checking than io.open
or does it use a different default encoding? Does python3 have an equivalent way to call io.open
to match python2 built in open
?
Currently I've started using f = io.open('filename.txt', mode='r', errors='replace')
which works. And comparing output to the original python2 version, no important data was lost from the replace errors.
Solution
First, io.open
is open
; there's no need to stop using open
directly.
The issue is that your call to open
is assuming the file is UTF-8-encoded when it is not. You'll have to supply the correct encoding explicitly.
open('filename.txt', encoding='iso-8859') # for example
(Note that the default encoding is platform-specific, but your error indicates that you are, in fact, defaulting to UTF-8.)
In Python 2, no attempt was made to decode non-ASCII files; reading from a file returned a str
value consisting of whatever bytes were actually stored in the file.
This is part of the overall shift in Python 3 from using the old str
type as sometimes text, sometimes bytes, to using str
exclusively for Unicode text and bytes
for any particular encoding of the text.
Answered By - chepner
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.