Issue
So, I have a data frame of this type:
Name 1 2 3 4 5
Alex 10 40 20 11 50
Alex 10 60 20 11 60
Sam 30 15 50 15 60
Sam 30 12 50 15 43
John 50 18 100 8 32
John 50 15 100 8 21
I am trying to keep only the columns that have repeated values for all unique row values. For example, in this case, I want to keep columns 1,3,4 because they have repeated values for each 'duplicate' row. But I want to keep the column only if the values are repeated for EACH pair of names - so, the whole column should consist of pairs of same values. Any ideas of how to do that?
Solution
Using a simple list
inside agg
:
cond = df.groupby('Name').agg(list).applymap(lambda x: len(x) != len(set(x)))
dupe_cols = cond.columns[cond.all()]
Answered By - Nuri Taş
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.