Saturday, January 6, 2024

[FIXED] How to get the average of values for one column based on another column value in python (pandas, jupyter)

January 06, 2024 jupyter-notebook, pandas, python No comments

Issue

the image shows the test dataset I am using to verify if the right averages are being calculated.

I want to be able to get the average of the corresponding values in the 'G' column based on the filtered values in the 'T' column.

So I set the values for the 'T' column based on which I want to sum the values in the 'G' column and then divide the total by the count to get an average, which is appended to a variable. however the average is not correctly calculated. see below screenshot

total=0
g_avg=[]
output=[]
counter=0
for i, row in df_new.iterrows():
    if (row['T'] > 2):
        counter+=1
        total+=row['G']
    if (counter != 0 and row['T']==10):
        g_avg.append(total/counter)
        counter = 0
        total = 0
        
print(g_avg)

below is a better set of data as there is repetition in the 'T' values so I would need a counter in order to get my average for the G values when the T value is in a certain range i.e. from 2am to 10 am etc sorry it wont allow me to just paste the dataset so ive took a snippy of it

Solution

If you want the average of column G values when T is between 2 and 7:

df_new.loc[(df_new['T']>2) & (df_new['T']<7), 'G'].mean()

Update

It's difficult to know exactly what you want without any expected output. If you have some data that looks like this:

And you want something like this:

Then you could use boolean indexing and DataFrame.loc:

avg = df.loc[(df['T']>2) & (df['T']<7), 'G'].mean()
df.loc[(df['T']>2) & (df['T']<7), 'G'] = avg

Update 2

If you have some sample data:

Method 1: To simply get a list of those means, you could create groups for your interval and filter on m:

m = df['T'].between(0,5,inclusive=False)
g = m.ne(m.shift()).cumsum()[m]
lst = df.groupby(g).mean()['G'].tolist()

print(lst)                                                                              
[2.0, 5.0]

Method 2: If you want to include these means at their respective T values, then you could do this instead:

m = df['T'].between(0,5,inclusive=False)
g = m.ne(m.shift()).cumsum()
df['G_new'] = df.groupby(g)['G'].transform('mean')

print(df)                                                                               
    T  G  G_new
0   0  1      1
1   2  2      2
2   3  3      2
3   3  1      2
4   3  2      2
5  10  4      4
6   2  5      5
7   2  5      5
8   2  5      5
9  10  5      5

Answered By - dkhara

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 6, 2024

[FIXED] How to get the average of values for one column based on another column value in python (pandas, jupyter)

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels