Friday, January 5, 2024

[FIXED] Attempting to get the rolling mean per group, getting wrong values and "TypeError: incompatible index of inserted column with frame index"

January 05, 2024 dataframe, numpy, pandas, python No comments

Issue

I seem to misunderstand and misuse pd.Series.rolling.mean(). I have a toy df here:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'a': np.random.choice(['x', 'y'], 8),
    'b': np.random.choice(['r', 's'], 8),
    'c': np.arange(1, 8 + 1)
})

I do this grouping operation:

df['ROLLING_MEAN'] = df.groupby(['a', 'b'])['c'].rolling(3).mean()#.values

That doesn't work. I get:

TypeError: incompatible index of inserted column with frame index

For some reason, when I uncomment the .values method, it works, but if I isolate one group, it doesn't have the intended effect.

df[
    (df['a'] == 'x') &
    (df['b'] == 'r')
]

   a  b  c  ROLLING_MEAN
0  x  r  1           NaN
2  x  r  3      2.666667
3  x  r  4      4.000000
4  x  r  5      5.666667
7  x  r  8           NaN

How can there be a rolling mean value of 5.666 while no number that high has even been seen yet?

Here is my expected output:

   a  b  c           ROLLING_MEAN
0  x  r  1                    NaN
2  x  r  3                    NaN
3  x  r  4      ((1 + 3 + 4) / 3)
4  x  r  5      ((3 + 4 + 5) / 3)
7  x  r  8      ((4 + 5 + 8) / 3)

Solution

If you check the output of df.groupby(['a', 'b'])['c'].rolling(3).mean() this is:

a  b   
x  r  3         NaN
      4         NaN
      6    5.333333
   s  1         NaN
y  r  2         NaN
      5         NaN
   s  0         NaN
      7         NaN
Name: c, dtype: float64

The extra levels make it incompatible with the original df.

You can use droplevel so it has the behavior you want:

df['ROLLING_MEAN'] = df.groupby(['a', 'b'])['c']
                        .rolling(3).mean()
                        .droplevel(['a', 'b'])

Output:

   a  b  c  ROLLING_MEAN
0  y  s  1           NaN
1  y  r  2           NaN
2  y  s  3           NaN
3  y  r  4           NaN
4  y  s  5      3.000000
5  x  r  6           NaN
6  y  r  7      4.333333
7  x  r  8           NaN

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, January 5, 2024

[FIXED] Attempting to get the rolling mean per group, getting wrong values and "TypeError: incompatible index of inserted column with frame index"

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels