Issue
This is my dataframe
import pandas as pd
data=pd.DataFrame({'vehicle':['car','car','car','car','car','car','bus','bus','bus','bus','bus','bus','car','car','car','car','car','car','bus','bus','bus','bus','bus','bus'],
'expecteddate':['2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022'],'range':[240,240,240,240,240,240,300,300,300,300,300,300,240,240,240,240,240,240,300,300,300,300,300,300],'color':['blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red'],'discount':[70,80,90,60,40,50,120,110,130,140,80,90,60,40,50,30,70,45,130,100,140,120,90,30],'date':['2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022']})
print(data)
data in dataframe:
vehicle expecteddate range color discount date
0 car 2/24/2022 240 blue 70 2/18/2022
1 car 2/24/2022 240 red 80 2/18/2022
2 car 3/15/2022 240 blue 90 2/18/2022
3 car 3/15/2022 240 red 60 2/18/2022
4 car 4/20/2022 240 blue 40 2/18/2022
5 car 4/20/2022 240 red 50 2/18/2022
6 bus 2/24/2022 300 blue 120 2/18/2022
7 bus 2/24/2022 300 red 110 2/18/2022
8 bus 3/15/2022 300 blue 130 2/18/2022
9 bus 3/15/2022 300 red 140 2/18/2022
10 bus 4/20/2022 300 blue 80 2/18/2022
11 bus 4/20/2022 300 red 90 2/18/2022
12 car 2/24/2022 240 blue 60 2/17/2022
13 car 2/24/2022 240 red 40 2/17/2022
14 car 3/15/2022 240 blue 50 2/17/2022
15 car 3/15/2022 240 red 30 2/17/2022
16 car 4/20/2022 240 blue 70 2/17/2022
17 car 4/20/2022 240 red 45 2/17/2022
18 bus 2/24/2022 300 blue 130 2/17/2022
19 bus 2/24/2022 300 red 100 2/17/2022
20 bus 3/15/2022 300 blue 140 2/17/2022
21 bus 3/15/2022 300 red 120 2/17/2022
22 bus 4/20/2022 300 blue 90 2/17/2022
23 bus 4/20/2022 300 red 30 2/17/2022
from this dataframe we have two vehicles ,three expecteddates ,range,two colors ,discount and date. we have to find min value in discount and date at which we got min value in discount column ,into seperate two new columns that is mindisc column and mindate and that should be save in new column based on latest date ,this should filtered based on vehicle,expecteddate,range,color and date
we have to find min value in discount column in two dates(all dates as we have many dates not limited to two dates) 2/18/2022,2/17/2022 based on same color,range,expecteddate and vehicle
finally this min to added to mindisc column at latest date and corresponding date at which min date appeared to mindate column at latest date row
output should look like
country expecteddate range color discount date mindisc mindate
0 car 2/24/2022 240 blue 70 2/18/2022 60 2/17/2022
1 car 2/24/2022 240 red 80 2/18/2022 40 2/17/2022
2 car 3/15/2022 240 blue 90 2/18/2022 50 2/17/2022
3 car 3/15/2022 240 red 60 2/18/2022 30 2/17/2022
4 car 4/20/2022 240 blue 40 2/18/2022 40 2/18/2022
5 car 4/20/2022 240 red 50 2/18/2022 45 2/17/2022
6 bus 2/24/2022 300 blue 120 2/18/2022 120 2/18/2022
7 bus 2/24/2022 300 red 110 2/18/2022 100 2/17/2022
8 bus 3/15/2022 300 blue 130 2/18/2022 130 2/18/2022
9 bus 3/15/2022 300 red 140 2/18/2022 120 2/17/2022
10 bus 4/20/2022 300 blue 80 2/18/2022 80 2/18/2022
11 bus 4/20/2022 300 red 90 2/18/2022 30 2/17/2022
12 car 2/24/2022 240 blue 60 2/17/2022
13 car 2/24/2022 240 red 40 2/17/2022
14 car 3/15/2022 240 blue 50 2/17/2022
15 car 3/15/2022 240 red 30 2/17/2022
16 car 4/20/2022 240 blue 70 2/17/2022
17 car 4/20/2022 240 red 45 2/17/2022
18 bus 2/24/2022 300 blue 130 2/17/2022
19 bus 2/24/2022 300 red 100 2/17/2022
20 bus 3/15/2022 300 blue 140 2/17/2022
21 bus 3/15/2022 300 red 120 2/17/2022
22 bus 4/20/2022 300 blue 90 2/17/2022
23 bus 4/20/2022 300 red 30 2/17/2022
vehicles are not limited to two like car and bus ,it has many vehicles and data is not always have equal rows in vehicle and range columns and date is not limited to two dates
Solution
This is my approach. Let's begin by treating the "date"
column as a datetime object:
df["date"] = pd.to_datetime(df["date"])
Now, we can group the data as you described to find the rows where the minimum discount is reached:
common_groupby = df.groupby(["color", "range", "expecteddate", "vehicle"])
Now, let's find the rows where the minimum discount and the maximum date happens for each group:
source_idx = common_groupby["discount"].idxmin()
target_idx = common_groupby["date"].idxmax()
# Use df.loc[idx] to see the rows where the minimum discount is reached
Now, we can add the values of the minimum discount and the respective date (from source_idx
rows) to the correct rows (from target_src
) by using .loc
:
df.loc[target_idx, "mindisc"] = df.loc[source_idx, "discount"].values
df.loc[target_idx, "mindate"] = df.loc[source_idx, "date"].values
As you can see, we're only changing the values in the rows where the minimum discount was reached (idx
). This is the output of these operations:
vehicle expecteddate range color discount date mindisc mindate
0 car 2/24/2022 240 blue 70 2022-02-18 60.0 2022-02-17
1 car 2/24/2022 240 red 80 2022-02-18 40.0 2022-02-17
2 car 3/15/2022 240 blue 90 2022-02-18 50.0 2022-02-17
3 car 3/15/2022 240 red 60 2022-02-18 30.0 2022-02-17
4 car 4/20/2022 240 blue 40 2022-02-18 40.0 2022-02-18
5 car 4/20/2022 240 red 50 2022-02-18 45.0 2022-02-17
6 bus 2/24/2022 300 blue 120 2022-02-18 120.0 2022-02-18
7 bus 2/24/2022 300 red 110 2022-02-18 100.0 2022-02-17
8 bus 3/15/2022 300 blue 130 2022-02-18 130.0 2022-02-18
9 bus 3/15/2022 300 red 140 2022-02-18 120.0 2022-02-17
10 bus 4/20/2022 300 blue 80 2022-02-18 80.0 2022-02-18
11 bus 4/20/2022 300 red 90 2022-02-18 30.0 2022-02-17
12 car 2/24/2022 240 blue 60 2022-02-17 NaN NaT
13 car 2/24/2022 240 red 40 2022-02-17 NaN NaT
14 car 3/15/2022 240 blue 50 2022-02-17 NaN NaT
15 car 3/15/2022 240 red 30 2022-02-17 NaN NaT
16 car 4/20/2022 240 blue 70 2022-02-17 NaN NaT
17 car 4/20/2022 240 red 45 2022-02-17 NaN NaT
18 bus 2/24/2022 300 blue 130 2022-02-17 NaN NaT
19 bus 2/24/2022 300 red 100 2022-02-17 NaN NaT
20 bus 3/15/2022 300 blue 140 2022-02-17 NaN NaT
21 bus 3/15/2022 300 red 120 2022-02-17 NaN NaT
22 bus 4/20/2022 300 blue 90 2022-02-17 NaN NaT
23 bus 4/20/2022 300 red 30 2022-02-17 NaN NaT
Another possible solution trying to improve perormance by sorting and then droping duplicates twice (inspired by this question: Select the max row per group - pandas performance issue):
# Min discount rows
source_df = df.sort_values(by=["discount"], ascending=True, kind='mergesort').drop_duplicates(["color", "range", "expecteddate", "vehicle"])
source_df = source_df.rename(columns={"date": "mindate", "discount": "mindisc"})
# Max date rows
target_df = df.reset_index().sort_values(by=["date"], ascending=False, kind="mergesort").drop_duplicates(["color", "range", "expecteddate", "vehicle"])
# Put min discount values into max date rows
df.loc[target_df["index"], ["mindisc", "mindate"]] = source_df[["mindisc", "mindate"]].values
Answered By - aaossa
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.