Issue
How can I perform a pandas groupby on a dataframe in 24-hour blocks, ignoring the date of the timestamp? I've tried with pandas.Grouper
with the freq
argument set to 24H
. However, it stubbornly groups by the date of the timestamp. I want it to group every 24 hours beginning at the hour of the first timestamp. For example, if it starts at hour 17, I want it to group up to hour 16 of the following day and continue the groups like this through the dataframe. How can I get a grouper to ignore the date?
Here is an example of what I get:
df_groupby = df_tmp.groupby(pd.Grouper(key='created_at', freq='24H'))
for idx, (_, group_df) in enumerate(df_groupby):
print(f"Group {idx}:")
print(group_df[['created_at', 'moisture']])
print('-------------------')
Output:
Group 0:
created_at moisture
0 2023-11-18 12:13:12.256927 535
1 2023-11-18 13:13:20.147592 535
2 2023-11-18 14:13:27.097894 535
3 2023-11-18 15:13:25.985715 535
4 2023-11-18 16:13:32.408867 534
5 2023-11-18 17:13:36.298025 534
6 2023-11-18 18:13:43.110883 534
7 2023-11-18 19:13:43.369302 534
8 2023-11-18 20:13:51.632185 534
9 2023-11-18 21:13:59.169360 534
10 2023-11-18 22:14:07.972644 534
11 2023-11-18 23:14:16.297836 533
-------------------
Group 1:
created_at moisture
12 2023-11-19 00:14:26.031192 534
13 2023-11-19 01:14:39.796824 532
14 2023-11-19 02:14:44.249399 533
15 2023-11-19 03:14:53.479820 534
16 2023-11-19 04:14:55.977077 534
17 2023-11-19 05:15:07.361019 534
18 2023-11-19 06:15:11.359716 534
19 2023-11-19 07:15:18.545638 534
20 2023-11-19 08:15:21.923445 534
21 2023-11-19 09:15:44.108795 534
22 2023-11-19 10:15:57.429115 533
23 2023-11-19 11:16:06.081096 534
24 2023-11-19 12:16:13.537274 534
25 2023-11-19 13:16:20.113579 534
26 2023-11-19 14:16:27.592455 534
27 2023-11-19 15:16:40.897159 533
28 2023-11-19 16:16:46.088631 534
29 2023-11-19 17:16:56.356958 533
30 2023-11-19 18:17:07.224618 534
31 2023-11-19 19:17:16.432370 533
32 2023-11-19 20:17:26.111768 533
33 2023-11-19 21:17:34.791802 533
34 2023-11-19 22:17:43.986497 531
35 2023-11-19 23:17:52.805056 532
-------------------
Group 2:
created_at moisture
36 2023-11-20 00:17:54.619490 533
37 2023-11-20 01:18:01.641144 532
38 2023-11-20 02:18:04.765892 532
39 2023-11-20 03:18:12.408328 532
40 2023-11-20 04:18:19.164817 532
41 2023-11-20 05:18:25.246881 532
42 2023-11-20 06:18:30.721551 532
43 2023-11-20 07:18:37.083176 532
44 2023-11-20 08:18:50.404909 532
45 2023-11-20 09:18:57.631082 532
46 2023-11-20 10:19:16.824349 532
47 2023-11-20 11:19:33.221346 532
48 2023-11-20 12:19:42.545535 532
49 2023-11-20 13:19:49.395201 532
50 2023-11-20 14:19:57.886827 532
51 2023-11-20 15:20:06.339089 532
52 2023-11-20 16:20:10.990417 532
53 2023-11-20 17:20:17.104666 532
54 2023-11-20 18:20:22.420334 532
55 2023-11-20 19:20:23.556865 533
56 2023-11-20 20:20:31.798930 531
57 2023-11-20 21:20:35.866586 531
58 2023-11-20 22:20:45.136861 531
59 2023-11-20 23:20:55.456402 531
-------------------
Solution
You missed the origin
parameter of pd.Grouper
:
# origin='start_day' (default)
>>> df.groupby(pd.Grouper(key='created_at', freq='24H')).size()
created_at
2023-12-01 7 # from 2023-12-01 17:00:00 to 2023-12-01 23:00:00
2023-12-02 18
Freq: 24H, dtype: int64
# origin='start' (what you expect)
>>> df.groupby(pd.Grouper(key='created_at', freq='24H', origin='start')).size()
created_at
2023-12-01 17:00:00 24 # from 2023-12-01 17:00:00 to 2023-12-02 16:00:00
2023-12-02 17:00:00 1
Freq: 24H, dtype: int64
# same with resample
>>> df.resample(rule='24H', on='created_at', origin='start').size()
created_at
2023-12-01 17:00:00 24
2023-12-02 17:00:00 1
Freq: 24H, dtype: int64
Minimal Working Example
dti = pd.date_range('2023-12-01 17:00', '2023-12-02 17:00', freq='1H')
df = pd.DataFrame({'created_at': dti , 'moisture': 1})
print(df)
# Output
created_at moisture
0 2023-12-01 17:00:00 1
1 2023-12-01 18:00:00 1
2 2023-12-01 19:00:00 1
3 2023-12-01 20:00:00 1
4 2023-12-01 21:00:00 1
5 2023-12-01 22:00:00 1
6 2023-12-01 23:00:00 1 # <- end when origin='start_day'
7 2023-12-02 00:00:00 1
8 2023-12-02 01:00:00 1
9 2023-12-02 02:00:00 1
10 2023-12-02 03:00:00 1
11 2023-12-02 04:00:00 1
12 2023-12-02 05:00:00 1
13 2023-12-02 06:00:00 1
14 2023-12-02 07:00:00 1
15 2023-12-02 08:00:00 1
16 2023-12-02 09:00:00 1
17 2023-12-02 10:00:00 1
18 2023-12-02 11:00:00 1
19 2023-12-02 12:00:00 1
20 2023-12-02 13:00:00 1
21 2023-12-02 14:00:00 1
22 2023-12-02 15:00:00 1
23 2023-12-02 16:00:00 1 # <- end when origin='start'
24 2023-12-02 17:00:00 1
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.