Issue
I have the dataframe.
data={"ID":[1,1,1,1,1,1,1,1,1,2,2,2],
"Year":[2000,2001,2002,2003,2004,1997,1998,2003,2004,1997,1998,2005],
"Firm":["A","A","B","B","A","A","A","A","B","B","A","A"],
"Count":[0,1,0,0,0,0,0,0,0,0,0,0]}
df1=pd.DataFrame(data)
The expected output is this.
data={"ID":[1,1,1,1,1,1,1,1,1,2,2,2],
"Year":[2000,2001,2002,2003,2004,1997,1998,2003,2004,1997,1998,2005],
"Firm":["A","A","B","B","A","A","A","A","B","B","A","A"],
"Count":[0,1,0,0,0,0,0,0,0,0,0,0],
"Count_1":[0,1,1,1,1,0,0,1,1,0,0,0]}
df2=pd.DataFrame(data)
I can achieve the expected output by my code.
df_1=df1.sort_values(by=["ID","Year"],ascending=True)
df_1["Count_1"]=np.where(df_1["Count"]==1,1,np.NaN)
df_1["Count_1"]=df_1.groupby(["ID"],as_index=None)["Count_1"].ffill()
df_1.drop(columns=["Count"],inplace=True)
df_1.fillna(0)
However, I am looking for a shorter and cleaner code.
Solution
Try:
df1 = df1.sort_values(by=["ID", "Year"])
df1["Count_1"] = (df1.groupby("ID")["Count"].cumsum() > 0).astype(int)
print(df1)
Prints:
ID Year Firm Count Count_1
5 1 1997 A 0 0
6 1 1998 A 0 0
0 1 2000 A 0 0
1 1 2001 A 1 1
2 1 2002 B 0 1
3 1 2003 B 0 1
7 1 2003 A 0 1
4 1 2004 A 0 1
8 1 2004 B 0 1
9 2 1997 B 0 0
10 2 1998 A 0 0
11 2 2005 A 0 0
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.