Issue
I have a dataframe(df) for user engagement as follows :-
time_stamp | user_id |
---|---|
2013-01-01 10:05:23 | 1 |
2013-01-03 16:35:23 | 1 |
2013-01-06 11:06:35 | 1 |
2013-01-10 12:05:43 | 1 |
2013-01-11 13:32:12 | 2 |
2013-01-04 16:26:34 | 3 |
2013-01-05 14:02:51 | 3 |
2013-01-11 18:35:53 | 3 |
2013-01-04 12:26:34 | 4 |
2013-01-05 13:31:11 | 4 |
2013-01-12 17:35:52 | 4 |
Each row is a single login to the system. An adopted user is a user who has logged into the product on three separate days in at least one seven-day period. How do I find the user_ids of all adopted users?
Output is a list of user_ids for adopted users -
user_list = ['1', '3']
Solution
First use floor
for floor by days and then groupby
with rolling
by each 3 rows. But there is problem need numeric, so datetimes are converted to unix times:
df['time_stamp'] = df['time_stamp'].dt.floor('d').astype(np.int64)
#sorting and remove duplicated days per users
df = df.sort_values(['user_id', 'time_stamp']).drop_duplicates()
a = df.groupby('user_id')['time_stamp'].rolling(window=3)
b = pd.to_timedelta((a.max()- a.min())).dt.days
print (b)
user_id
1 0 NaN
1 NaN
2 5.0
3 7.0
2 4 NaN
3 5 NaN
6 NaN
7 7.0
4 8 NaN
9 NaN
10 8.0
Name: time_stamp, dtype: float64
c = b[b == 7].index.get_level_values('user_id').tolist()
print (c)
[1, 3]
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.