Issue
I'm using Python (2.7) and Requests to grab data from the Facebook API and then using Pandas to report on the output via IPython. Somewhere along the journey I'm encountering Unicode / Ascii errors and I'm stumped about what to change.
Hoping the solution will be obvious to someone well versed in the area.
First, I'm using Requests to grab the API data using a helper module I've created.
_current_request = https://graph.facebook.com/officialstackoverflow/feed?access_token=[Redacted access token]
response = requests.get(_current_request)
Requests.json()
fails straight away due to encoding, so I've been using the following:
encoded = response.content.encode("utf-8") # Excuse verbosity, just trying
json_response = json.loads(encoded) # to be clear on my thought
response_list = list() # process, and hoping it will help
response_list += json_response["data"] # debugging.
(The "data"
key is the actual contents from the FB API. It's a list of individual post objects)
I'm then passing the response_list
object back into the IPython notebook to manipluate.
[1] pd.DataFrame(response_list)
Traceback:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-290-0613adf928ec> in <module>()
/Users/Shared/Sites/Virtualenv/api_manager/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
236 self.write_format_data(format_dict, md_dict)
237 self.log_output(format_dict)
--> 238 self.finish_displayhook()
239
240 def cull_cache(self):
/Users/Shared/Sites/Virtualenv/api_manager/lib/python2.7/site-packages/IPython/kernel/zmq/displayhook.pyc in finish_displayhook(self)
70 sys.stderr.flush()
71 if self.msg['content']['data']:
---> 72 self.session.send(self.pub_socket, self.msg, ident=self.topic)
73 self.msg = None
74
/Users/Shared/Sites/Virtualenv/api_manager/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
647 if self.adapt_version:
648 msg = adapt(msg, self.adapt_version)
--> 649 to_send = self.serialize(msg, ident)
650 to_send.extend(buffers)
651 longest = max([ len(s) for s in to_send ])
/Users/Shared/Sites/Virtualenv/api_manager/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in serialize(self, msg, ident)
551 content = self.none
552 elif isinstance(content, dict):
--> 553 content = self.pack(content)
554 elif isinstance(content, bytes):
555 # content is already packed, as in a relayed message
/Users/Shared/Sites/Virtualenv/api_manager/lib/python2.7/site-packages/IPython/kernel/zmq/session.pyc in <lambda>(obj)
83 # disallow nan, because it's not actually valid JSON
84 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default,
---> 85 ensure_ascii=False, allow_nan=False,
86 )
87 json_unpacker = lambda s: jsonapi.loads(s)
/Users/Shared/Sites/Virtualenv/api_manager/lib/python2.7/site-packages/zmq/utils/jsonapi.pyc in dumps(o, **kwargs)
38 kwargs['separators'] = (',', ':')
39
---> 40 s = jsonmod.dumps(o, **kwargs)
41
42 if isinstance(s, unicode):
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
248 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
249 separators=separators, encoding=encoding, default=default,
--> 250 sort_keys=sort_keys, **kw).encode(obj)
251
252
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.pyc in encode(self, o)
208 if not isinstance(chunks, (list, tuple)):
209 chunks = list(chunks)
--> 210 return ''.join(chunks)
211
212 def iterencode(self, o, _one_shot=False):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 17915: ordinal not in range(128)
Clearly it's an issue along the path with the encoding / decoding of Unicode objects into the DataFrame, but what's confusing me is that Pandas has a native Unicode object so I'm not sure why the Ascii conversion is taking place anyway.
Thanks in advance for any help, and please ask if I need to add any further info.
Additional info
I've looking into the data types for each dictionary key and confirmed that it's a mix of sub-dicts and Unicode objects:
Key: picture, <type 'unicode'>
Key: story, <type 'unicode'>
Key: likes, <type 'dict'>
Key: from, <type 'dict'>
Key: comments, <type 'dict'>
Key: message_tags, <type 'dict'>
Key: privacy, <type 'dict'>
Key: actions, <type 'list'>
Key: updated_time, <type 'unicode'>
Key: to, <type 'dict'>
Key: link, <type 'unicode'>
Key: object_id, <type 'unicode'>
Key: story_tags, <type 'dict'>
Key: created_time, <type 'unicode'>
Key: message, <type 'unicode'>
Key: type, <type 'unicode'>
Key: id, <type 'unicode'>
Key: status_type, <type 'unicode'>
Key: icon, <type 'unicode'>
I've tried re-encoding each of these into str
but that's not helping either - and it also doesn't seem like it should be necessary, as Pandas can handle Unicode anyway.
Solution
Reposting as an answer:
This is a known issue in at least IPython 3.0, and probably older versions. A fix has been merged, and will be in IPython 3.1.
The issue only affects Python 2.
Answered By - Thomas K
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.