Issue
Apologies, I'm a newbie to coding/pandas/python.
I am trying to sum the column only if there is a "1" located in the "buy" and "sell" neighboring columns
Date | Buy | Sell | Value | Cumulative Sum |
---|---|---|---|---|
01/01/2023 | 1 | |||
01/02/2023 | 1 | 5 | 5 | |
01/03/23 | 1 | 6 | ||
01/04/23 | 1 | 7 | ||
01/05/23 | 1 | 1 | 8 | |
01/06/23 | 5 |
I saw that there is a formula called **cumsum(axis=None, skipna=True, *args, kwargs), that might be able to do this. but unfortunately, I'm not too sure how to include the conditional on this based on the buy and sell columns
I'm not sure how to do this with pandas formulas. Also would i maybe need to utilze a loop for this?
Any suggestions or tips would be greatly appreciated!!
Solution
Try to avoid loops. I think this is what you are looking for:
df1 = pd.DataFrame({
'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12'],
'buy': [None,1,None,None,None,None,None,1,None,None,None,None],
'sell':[None,None,None,None,1,None,None,None,None,None,1,None],
'value':[1,5,1,1,1,5,1,5,1,1,1,5]
}
)
# make sell negative so it can be discerned from buy
df1['sell'] = df1['sell'] * -1
print(df1)
date buy sell value
0 2023-01-01 NaN NaN 1
1 2023-01-02 1.0 NaN 5
2 2023-01-03 NaN NaN 1
3 2023-01-04 NaN NaN 1
4 2023-01-05 NaN -1.0 1
5 2023-01-06 NaN NaN 5
6 2023-01-07 NaN NaN 1
7 2023-01-08 1.0 NaN 5
8 2023-01-09 NaN NaN 1
9 2023-01-10 NaN NaN 1
10 2023-01-11 NaN -1.0 1
11 2023-01-12 NaN NaN 5
# create 'buysell' column which is combination of 'buy' and 'sell' columns
df1['buysell'] = df1['buy'].combine_first(df1['sell'])
# use ffill() to fill buysell from 1 until it is not 1
df1.loc[df1['buysell'].ffill() == 1, 'buysell'] = 1
# create a mask for where 'buysell' is not NaN
mask = ~df1['buysell'].isna()
# use the mask to create a 'buysellvalue' column with the contents of 'value' column for rows where the mask is true
df1.loc[mask, 'buysellvalue'] = df1.loc[mask, 'value']
# use cumsum()
df1['cumbuysellvalue'] = df1['buysellvalue'].cumsum()
print(df1)
date buy sell value buysell buysellval cumbuysellval
0 2023-01-01 NaN NaN 1 NaN NaN NaN
1 2023-01-02 1.0 NaN 5 1.0 5.0 5.0
2 2023-01-03 NaN NaN 1 1.0 1.0 6.0
3 2023-01-04 NaN NaN 1 1.0 1.0 7.0
4 2023-01-05 NaN -1.0 1 -1.0 1.0 8.0
5 2023-01-06 NaN NaN 5 NaN NaN NaN
6 2023-01-07 NaN NaN 1 NaN NaN NaN
7 2023-01-08 1.0 NaN 5 1.0 5.0 13.0
8 2023-01-09 NaN NaN 1 1.0 1.0 14.0
9 2023-01-10 NaN NaN 1 1.0 1.0 15.0
10 2023-01-11 NaN -1.0 1 -1.0 1.0 16.0
11 2023-01-12 NaN NaN 5 NaN NaN NaN
Answered By - misterhuge
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.