Wednesday, December 20, 2023

[FIXED] cumulative sum based on conditions of 2 neighboring colum

December 20, 2023 jupyter, pandas, python No comments

Issue

Apologies, I'm a newbie to coding/pandas/python.

I am trying to sum the column only if there is a "1" located in the "buy" and "sell" neighboring columns

Date	Buy	Sell	Value	Cumulative Sum
01/01/2023			1
01/02/2023	1		5	5
01/03/23			1	6
01/04/23			1	7
01/05/23		1	1	8
01/06/23			5

I saw that there is a formula called **cumsum(axis=None, skipna=True, *args, kwargs), that might be able to do this. but unfortunately, I'm not too sure how to include the conditional on this based on the buy and sell columns

I'm not sure how to do this with pandas formulas. Also would i maybe need to utilze a loop for this?

Any suggestions or tips would be greatly appreciated!!

Solution

Try to avoid loops. I think this is what you are looking for:

df1 = pd.DataFrame({
                    'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12'], 
                    'buy': [None,1,None,None,None,None,None,1,None,None,None,None],
                    'sell':[None,None,None,None,1,None,None,None,None,None,1,None],
                    'value':[1,5,1,1,1,5,1,5,1,1,1,5]
                    }
                  )

# make sell negative so it can be discerned from buy
df1['sell'] = df1['sell'] * -1
        
print(df1)
          date  buy  sell  value
0   2023-01-01  NaN   NaN      1
1   2023-01-02  1.0   NaN      5
2   2023-01-03  NaN   NaN      1
3   2023-01-04  NaN   NaN      1
4   2023-01-05  NaN  -1.0      1
5   2023-01-06  NaN   NaN      5
6   2023-01-07  NaN   NaN      1
7   2023-01-08  1.0   NaN      5
8   2023-01-09  NaN   NaN      1
9   2023-01-10  NaN   NaN      1
10  2023-01-11  NaN  -1.0      1
11  2023-01-12  NaN   NaN      5
        
# create 'buysell' column which is combination of 'buy' and 'sell' columns
        
df1['buysell'] = df1['buy'].combine_first(df1['sell'])
        
# use ffill() to fill buysell from 1 until it is not 1
df1.loc[df1['buysell'].ffill() == 1, 'buysell'] = 1
        
        
# create a mask for where 'buysell' is not NaN
mask = ~df1['buysell'].isna()
            
# use the mask to create a 'buysellvalue' column with the contents of 'value' column for rows where the mask is true
df1.loc[mask, 'buysellvalue'] = df1.loc[mask, 'value']
            
# use cumsum()
df1['cumbuysellvalue'] = df1['buysellvalue'].cumsum()
            
print(df1)
          date  buy  sell  value  buysell  buysellval  cumbuysellval
0   2023-01-01  NaN   NaN      1      NaN         NaN            NaN
1   2023-01-02  1.0   NaN      5      1.0         5.0            5.0
2   2023-01-03  NaN   NaN      1      1.0         1.0            6.0
3   2023-01-04  NaN   NaN      1      1.0         1.0            7.0
4   2023-01-05  NaN  -1.0      1     -1.0         1.0            8.0
5   2023-01-06  NaN   NaN      5      NaN         NaN            NaN
6   2023-01-07  NaN   NaN      1      NaN         NaN            NaN
7   2023-01-08  1.0   NaN      5      1.0         5.0           13.0
8   2023-01-09  NaN   NaN      1      1.0         1.0           14.0
9   2023-01-10  NaN   NaN      1      1.0         1.0           15.0
10  2023-01-11  NaN  -1.0      1     -1.0         1.0           16.0
11  2023-01-12  NaN   NaN      5      NaN         NaN            NaN

Answered By - misterhuge

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 20, 2023

[FIXED] cumulative sum based on conditions of 2 neighboring colum

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels