Issue
I have below pandas dataframe which has employees sales data for october month.
Employee Timerange Dials Conn Conv Mtg Bkd Talk Dial
0 Ricky Ponting 10/3 - 10/7 1,869 102 60.0 2.0 3h:08m 5h:23m
1 Adam Gilchrist 10/3 - 10/7 1,336 53 30.0 1.0 1h:10m 3h:58m
2 Michael Clarke 10/3 - 10/7 1,960 74 42.0 1.0 2h:02m 5h:28m
3 Shane Warne 10/3 - 10/7 1,478 62 45.0 1.0 1h:55m 4h:07m
Schema -
# Column Non-Null Count Dtype
--- ------ -------------- -----
1 Timerange 40 non-null object
2 Dials 40 non-null object
3 Conn 40 non-null int64
4 Conv 39 non-null float64
5 Mtg Bkd 39 non-null float64
6 Talk 40 non-null object
7 Dial︎ 40 non-null object
I basically want to check the dials-to-connection and the dials-to-conversation average rates of the whole team for the month. Example output like below -
Month Dials Conn Dials -> Conn Dials -> Conv
October 60517 2702 0.045 0.026
I tried using pd.DatetimeIndex(df['Timerange']).Month and separate the column but it is giving me error dateutil.parser._parser.ParserError: Unknown string format: 10/3 - 10/7. Please help me guys
Solution
Here is a proposition using pandas.DataFrame.groupby
and pandas.DataFrame.apply
:
#Extract the month number from the start date and convert it to a month name
df["Month"]= pd.to_datetime(df["Timerange"].str.extract(r"(\d+)/\d+", expand=False), format="%m").dt.month_name()
#Convert comma separated strings to numbers
df["Dials"]= df["Dials"].str.replace(",", "").astype(float)
out = (
df.groupby("Month", as_index=False)
.apply(lambda x: pd.Series({"Dials": x["Dials"].sum(),
"Conn": x["Conn"].sum(),
"Dials -> Conn": x["Conn"].sum()/x["Dials"].sum(),
"Dials -> Conv": x["Conv"].sum()/x["Dials"].sum()}))
)
# Output :
print(out)
Month Dials Conn Dials -> Conn Dials -> Conv
0 October 6643.0 291.0 0.043806 0.026645
Answered By - abokey
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.