Issue
Given the following data, of which users were active each day, I would like to compute the number of new users for each day, using Pandas.
Data:
Day | UserID
----------
1 | A
1 | B
1 | C
1 | C
----------
2 | A
2 | B
2 | D
2 | A
2 | E
----------
3 | B
3 | D
3 | F
Result:
Day | New Users
---------------
1 | 3
2 | 2
3 | 1
The way I see it, the steps are:
- Compute the max UserID per day: df.groupby('Day').UserID.max()
- Filter the data with the max UserID of the previous day (initialize at 0): here I have no clue how to do this using Pandas
- Compute the number of signs in from unique users over the filtered data set. df.filtered.groupby('Day').UserID.nunique()
Is there a clean way to achieve this?
Solution
This build as a table giving the day each ID is first seen, groups by the day, and then counts the corresponding rows.
df = pd.DataFrame([(1, "A"), (1, "B"),
(1, "C"), (1, "C"),
(2, "A"), (2, "B"),
(2, "D"), (2, "A"),
(2, "E"), (3, "B"),
(3, "D"), (3, "F")],
columns=["day", "userid"])
(df
.sort_values('day')
.groupby('userid')
.first()
.rename(columns={"day": "first_seen"})
.groupby('first_seen').size()
)
Answered By - Tim Hopper
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.