Issue
I have a table like this:
col1 col2
ben US-US-Uk
Man Uk-NL-DE
bee CA-CO-MX-MX
how can I unique the values in col 2, which means have a table like this?
col1 col2
ben US-Uk
Man Uk-NL-DE
bee CA-CO-MX
I have tried this :
a.cc.str.split('-').unique()
but get the following error:
TypeError: unhashable type: 'list'
Does anybody know how to do this?
Solution
You can use apply
to call a lambda function that splits the string and then joins on the unique values:
In [10]:
df['col2'] = df['col2'].apply(lambda x: '-'.join(set(x.split('-'))))
df
Out[10]:
col1 col2
0 ben Uk-US
1 Man Uk-NL-DE
2 bee CA-CO-MX
Another method:
In [22]:
df['col2'].str.split('-').apply(lambda x: '-'.join(set(x)))
Out[22]:
0 Uk-US
1 Uk-NL-DE
2 CA-CO-MX
Name: col2, dtype: object
timings
In [24]:
%timeit df['col2'].str.split('-').apply(lambda x: '-'.join(set(x)))
%timeit df['col2'] = df['col2'].apply(lambda x: '-'.join(set(x.split('-'))))
1000 loops, best of 3: 418 µs per loop
1000 loops, best of 3: 246 µs per loop
Answered By - EdChum
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.