Issue
My DataFrame is:
df = pd.DataFrame(
{
'a': [20, 9, 31, 40],
'b': [1, 10, 17, 30],
}
)
Expected output: Creating column c
and name
a b c name
0 20 1 20 NaN
1 9 10 20 NaN
2 31 17 17 NaN
3 40 30 40 a
Steps:
a) c
is created by df['c'] = np.fmax(df['a'].shift().bfill(), df['b'])
b) for the last row: df['c'] = df[['a', 'b']].max()
. Since for the last row a > b
40 is chosen.
c) Get the name of max value between a
or b
for the last row.
My attempt:
df['c'] = np.fmax(df['a'].shift().bfill(), df['b'])
df.loc[df.index[-1], 'c'] = df.loc[df.index[-1], ['a', 'b']].max()
df.loc[df.index[-1], 'name'] = df.loc[df.index[-1], ['a', 'b']].idxmax()
Is it the cleanest way / best approach?
Solution
I don't how much of an improvement it is but you can combine the last two lines of code into a single single line if you use agg()
.
df['c'] = np.fmax(df['a'].shift().bfill(), df['b'])
idx = df.index[-1]
df.loc[idx, ['c', 'name']] = df.loc[idx, ['a', 'b']].agg(['max', 'idxmax']).to_numpy()
To create a copy, we could define a mask that flags the last row and assign()
"c" and "name" columns.
msk = df.index == df.index[-1]
df1 = df.assign(
c=np.fmax(df['a'].shift().bfill().mask(msk, df['a']), df['b']),
name=df[['a', 'b']].idxmax(axis=1).where(msk)
)
Answered By - cottontail
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.