Issue
Assuming we have dataset df
(which can be downloaded from this link), I want to create some features based on the mean of y for the month of the past several years, for example: y_avg_last2
, y_avg_last3
, y_avg_last4
, etc., for September 2022, y_avg_last2
= The mean of September 2021 and September 2020, y_avg_last3 = the mean of September 2021, September 2020, September 2019.
The code I use is as follows, which is relatively repetitive and trivial:
df['y_shift12'] = df['y'].shift(12)
df['y_shift24'] = df['y'].shift(24)
df['y_shift36'] = df['y'].shift(36)
df['y_avg_last2'] = df.loc[:, 'y_shift12': 'y_shift24'].mean(axis=1)
df['y_avg_last3'] = df.loc[:, 'y_shift12': 'y_shift36'].mean(axis=1)
df.drop(['y_shift12', 'y_shift24', 'y_shift36'], axis=1, inplace=True)
df
How can the desired result be achieved more concisely?
df.tail(10)
Out:
df.head(10)
Out:
Reference:
Pandas, how to calculate mean values of the past n years for every month
Solution
You can construct the shifted columns in a separate object so you don't have to drop from the dataframe after. Combine that with loops for conciseness:
shifted = np.array([df["y"].shift(i) for i in [12, 24, 36]]).T
for i in range(2, 4):
df[f"y_avg_last{i}"] = shifted[:, :i].mean(axis=1)
Answered By - Code Different
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.