Thursday, February 1, 2024

[FIXED] Merge two Pandas Dataframes based on two IDs

February 01, 2024 dataframe, pandas, python No comments

Issue

I have two tables, one is called 'result_df':

tourney_name	tourney_year	tourney_month	tourney_day	player1_id	player2_id	predicted_outcome	predicted_probability	actual_outcome
3	2024	1	1	1001	1002	0	0.028771	0
3	2024	1	1	1007	1008	1	0.647801	1
3	2024	1	1	1013	1014	1	0.652244	1

And the other one is called 'betting_lines_24':

winner_id	winner_name	loser_name	loser_id	B365W	B365L	tourney_year	tourney_month	tourney_day
1013	Dimitrov G.	Murray A.	1014	1.33	3.40	2024	1	1
1001	Safiullin R.	Shelton B.	1002	2.30	1.62	2024	1	1
1007	Rune H.	Purcell M.	1008	1.25	4.00	2024	1	1

And now I need to merge them, so that the resulting table will be like 'result_df', but adding the name of each player (based on the IDs) and also the betting lines for each player. The resulting dataframe should be something like this:

tourney_name	tourney_year	tourney_month	tourney_day	player1_id	player1_name	player2_name	player2_id	B365W	B365L	predicted_probability	predicted_outcome	actual_outcome
3	2024	1	1	1002	Shelton B.	Safiullin R.	1001	1.62	2.30	0.028771	0	0
3	2024	1	1	1007	Rune H.	Purcell M.	1008	1.25	4.00	0.647801	1	1
3	2024	1	1	1013	Dimitrov G.	Murray A.	1014	1.33	3.40	0.652244	1	1

The IDs in one 'result_df' may be shuffled on the 'betting_lines_24', not always the player1_id equals to the winner_id, it may happen that player1_id is in the loser_id column in the 'betting_lines_24' dataframe. Also the matches in 'result_df' are not in the same order as in 'betting_lines_24'.

How could i do it?

Solution

Code

col1 = ['player1_id', 'player1_name', 'player2_name', 'player2_id', 'B365W', 'B365L', 'tourney_year', 'tourney_month', 'tourney_day']
col2 = ['player2_id', 'player2_name', 'player1_name', 'player1_id', 'B365L', 'B365W', 'tourney_year', 'tourney_month', 'tourney_day']
tmp = pd.concat([betting_lines_24.set_axis(col1, axis=1), betting_lines_24.set_axis(col2, axis=1)])
col3 = ['tourney_name', 'tourney_year', 'tourney_month', 'tourney_day', 'player1_id', 'player1_name', 
        'player2_name', 'player2_id', 'B365W', 'B365L', 'predicted_probability', 'predicted_outcome',
        'actual_outcome']
out = result_df.merge(tmp, how='left')[col3]

out

   tourney_name  tourney_year  tourney_month  tourney_day  player1_id  \
0             3          2024              1            1        1001   
1             3          2024              1            1        1007   
2             3          2024              1            1        1013   

   player1_name player2_name  player2_id  B365W  B365L  predicted_probability  \
0  Safiullin R.   Shelton B.        1002   2.30   1.62               0.028771   
1       Rune H.   Purcell M.        1008   1.25   4.00               0.647801   
2   Dimitrov G.    Murray A.        1014   1.33   3.40               0.652244   

   predicted_outcome  actual_outcome  
0                  0               0  
1                  1               1  
2                  1               1

plz read notification before ask question : minimal-reproducible-example

Answered By - Panda Kim

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, February 1, 2024

[FIXED] Merge two Pandas Dataframes based on two IDs

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels