Issue
I seem to misunderstand and misuse pd.Series.rolling.mean()
. I have a toy df
here:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': np.random.choice(['x', 'y'], 8),
'b': np.random.choice(['r', 's'], 8),
'c': np.arange(1, 8 + 1)
})
a b c
0 y s 1
1 y r 2
2 y s 3
3 y r 4
4 y s 5
5 x r 6
6 y r 7
7 x r 8
I do this grouping operation:
df['ROLLING_MEAN'] = df.groupby(['a', 'b'])['c'].rolling(3).mean()#.values
That doesn't work. I get:
TypeError: incompatible index of inserted column with frame index
For some reason, when I uncomment the .values
method, it works, but if I isolate one group, it doesn't have the intended effect.
df[
(df['a'] == 'x') &
(df['b'] == 'r')
]
a b c ROLLING_MEAN
0 x r 1 NaN
2 x r 3 2.666667
3 x r 4 4.000000
4 x r 5 5.666667
7 x r 8 NaN
How can there be a rolling mean value of 5.666
while no number that high has even been seen yet?
Here is my expected output:
a b c ROLLING_MEAN
0 x r 1 NaN
2 x r 3 NaN
3 x r 4 ((1 + 3 + 4) / 3)
4 x r 5 ((3 + 4 + 5) / 3)
7 x r 8 ((4 + 5 + 8) / 3)
Solution
If you check the output of df.groupby(['a', 'b'])['c'].rolling(3).mean()
this is:
a b
x r 3 NaN
4 NaN
6 5.333333
s 1 NaN
y r 2 NaN
5 NaN
s 0 NaN
7 NaN
Name: c, dtype: float64
The extra levels make it incompatible with the original df.
You can use droplevel
so it has the behavior you want:
df['ROLLING_MEAN'] = df.groupby(['a', 'b'])['c']
.rolling(3).mean()
.droplevel(['a', 'b'])
Output:
a b c ROLLING_MEAN
0 y s 1 NaN
1 y r 2 NaN
2 y s 3 NaN
3 y r 4 NaN
4 y s 5 3.000000
5 x r 6 NaN
6 y r 7 4.333333
7 x r 8 NaN
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.