Issue
Here is an example of my dataframe:
df = pd.DataFrame([['In', 'Age', 'Nat.'],
['Jakub Kiwior', 22, 'Poland'],
['Leandro Trossard', 28, 'Belgium'],
['Jorginho', 31, 'Italy'],
['Out', 'Age', 'Nat.'],
['Jhon Durán', 19, 'Colombia'],
['In', 'Age', 'Nat.'],
['Jhon Durán', 19, 'Colombia'],
['Álex Moreno', 29, 'Spain'],
['Out', 'Age', 'Nat.'],
['Leandro Trossard', 28, 'Belgium'],
['Jorginho', 31, 'Italy'],
['In', 'Age', 'Nat.'],
['Out', 'Age', 'Nat.'],
['In', 'Age', 'Nat.'],
], columns=['Player', 'Age', 'Nat.'])
My desired output is a dataframe that removes duplicate rows if the row above (not necessarily directly above) has the value 'Out' in the 'Player' column.
For example, the desired output would remove the first "Jhon Durán" row, and the second "Leandro Trossard" and "Jorginho" rows, since these are the rows with "Out" above them and not "In".
Is this possible to achieve with pandas?
Solution
You could use Pandas shift method to help achieve this.
df['previousPlayer'] = df['Player'].shift(1)
df
Player Age Nat. previousPlayer
0 In Age Nat. NaN
1 Jakub Kiwior 22 Poland In
2 Leandro Trossard 28 Belgium Jakub Kiwior
3 Jorginho 31 Italy Leandro Trossard
4 Out Age Nat. Jorginho
5 Jhon Durán 19 Colombia Out
6 In Age Nat. Jhon Durán
7 Jhon Durán 19 Colombia In
8 Álex Moreno 29 Spain Jhon Durán
9 Out Age Nat. Álex Moreno
10 Leandro Trossard 28 Belgium Out
11 Jorginho 31 Italy Leandro Trossard
12 In Age Nat. Jorginho
13 Out Age Nat. In
14 In Age Nat. Out
Then simply filter out any values in the new column with the word of your choice:
df = df[df.previousPlayer != 'Out'].drop('previousPlayer', axis=1)
print(df)
Player Age Nat.
0 In Age Nat.
1 Jakub Kiwior 22 Poland
2 Leandro Trossard 28 Belgium
3 Jorginho 31 Italy
4 Out Age Nat.
6 In Age Nat.
7 Jhon Durán 19 Colombia
8 Álex Moreno 29 Spain
9 Out Age Nat.
11 Jorginho 31 Italy
12 In Age Nat.
13 Out Age Nat.
Answered By - straka86
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.