Issue
>>> b'potato {} potato'.format(u'potato') # return value matches the template
'potato potato potato'
>>> b'potato %s potato' % u'potato' # return value is coerced
u'potato potato potato'
In str.format
the template controls the return type, but in str.__mod__
the template gets 'promoted' to a unicode.
- Is that documented / reliable behaviour?
- How to do a percent-style substitution such that the return type is matching the template?
The obvious guess doesn't work:
>>> b'potato %b potato' % u'potato'
ValueError: unsupported format character 'b' (0x62) at index 8
I'm not interested in solutions which do type-checking and/or an explicit decode/encode call. Ideally I want the templating to raise UnicodeEncodeError
if the template variable is a unicode object and it can't be encoded to ascii.
Solution
This is documented both in the start of the section String Formatting Operations and in the conversion table, specifically, in the beginning of the section it is stated:
Given
format % values
... if format is a Unicode object, or if any of the objects being converted using the%s
conversion are Unicode objects, the result will also be a Unicode object.
so that's a given.
A solution that doesn't involve .encode
is unattainable from what I understand (and repr
ing with %r
isn't an option either). str.__mod__
is a fast operation that doesn't handle much for you, .format
has the courtesy audacity of calling .encode
for you while also providing other goodies (hence why it exists).
Aside: if anyone was wandering, it is also documented in the specification of PEP 3101 that, for .format
, the type of the format string will determine the outcome.
Answered By - Dimitris Fasarakis Hilliard
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.