Issue
I have a set of strings formatted as BBB
below, and I need to extract the value corresponding to the text
key (in the example below it's "My life is amazing"
).
BBB = str({
"id": "18976",
"episode_done": False,
"text": "My life is amazing",
"text_candidates": [
"My life is amazing",
"I am worried about global warming"
],
"metrics": {
"clen": AverageMetric(12),
"ctrunc": AverageMetric(0),
"ctrunclen": AverageMetric(0)
}
}
)
I tried converting BBB
into a string and then into a dictionary using json.load
and ast.literal_eval
, but I get error messages in both cases. I suppose this is due to the fact that the metrics
key has a dictionary as a value.
How do you suggest to solve the issue?
Solution
You could adapt the source of ast.literal_eval()
to something that parses function calls (and other non-literals), but into strings:
import ast
BBB = """
{"id": "18976", "episode_done": False, "text": "My life is amazing",
"text_candidates": ["My life is amazing", "I am worried about global warming"],
"metrics": {"clen": AverageMetric(12), "ctrunc": AverageMetric(0),
"ctrunclen": AverageMetric(0)}}
""".strip()
def literal_eval_with_function_calls(source):
# Adapted from `ast.literal_eval`
def _convert(node):
if isinstance(node, list):
return [_convert(arg) for arg in node]
if isinstance(node, ast.Constant):
return node.value
if isinstance(node, ast.Tuple):
return tuple(map(_convert, node.elts))
if isinstance(node, ast.List):
return list(map(_convert, node.elts))
if isinstance(node, ast.Set):
return set(map(_convert, node.elts))
if isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == 'set' and node.args == node.keywords == []:
return set()
if isinstance(node, ast.Dict):
return dict(zip(map(_convert, node.keys), map(_convert, node.values)))
if isinstance(node, ast.Expression):
return _convert(node.body)
return {
f'${node.__class__.__name__}': ast.get_source_segment(source, node),
}
return _convert(ast.parse(source, mode='eval'))
print(literal_eval_with_function_calls(BBB))
This outputs
{'episode_done': False,
'id': '18976',
'metrics': {'clen': {'$Call': 'AverageMetric(12)'},
'ctrunc': {'$Call': 'AverageMetric(0)'},
'ctrunclen': {'$Call': 'AverageMetric(0)'}},
'text': 'My life is amazing',
'text_candidates': ['My life is amazing', 'I am worried about global warming']}
However, it would be better to just have data that's not in a non-parseable format to begin with...
Answered By - AKX
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.