Issue
I have a csv file dumped from a DataFrame that had array fields, which looks like below:
col1,col2,col3
"[454.40145382, 254.9620727, 0.9147141790444601]",2.1299914162447497,-29.85771596074138
When I read this back with pd.read_csv
, col1 is parsed as a string by default. Is there a standard way I can specify for such fields to be parsed into numpy arrays whenever possible?
Alternatively, is there a numpy built-in that handles string arrays? I did not have much luck with np.fromstring
, which complains about
ValueError: string size must be a multiple of element size
My workaround is simply doing it myself with something like below (can be modified to be recursive for 1D+)
data["col1_parsed"] = data["col1"].apply(lambda x: [float(e.strip().replace("[","").replace("]","")) for e in x.split(",")])
But it feels too simple and tedious for it to not be handled by some builtin already
Solution
You can use eval
if you trust the origin of data or ast.literal_eval
(which should be better)
import ast
# df = pd.read_csv('data2.csv', converters={'col1': eval})
df = pd.read_csv('data2.csv', converters={'col1': ast.literal_eval})
Output:
>>> df
col1 col2 col3
0 [454.40145382, 254.9620727, 0.9147141790444601] 2.129991 -29.857716
>>> df.loc[0, 'col1']
[454.40145382, 254.9620727, 0.9147141790444601]
>>> type(df.loc[0, 'col1'])
list
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.