Issue
I have two tables, one is called 'result_df':
tourney_name | tourney_year | tourney_month | tourney_day | player1_id | player2_id | predicted_outcome | predicted_probability | actual_outcome |
---|---|---|---|---|---|---|---|---|
3 | 2024 | 1 | 1 | 1001 | 1002 | 0 | 0.028771 | 0 |
3 | 2024 | 1 | 1 | 1007 | 1008 | 1 | 0.647801 | 1 |
3 | 2024 | 1 | 1 | 1013 | 1014 | 1 | 0.652244 | 1 |
And the other one is called 'betting_lines_24':
winner_id | winner_name | loser_name | loser_id | B365W | B365L | tourney_year | tourney_month | tourney_day |
---|---|---|---|---|---|---|---|---|
1013 | Dimitrov G. | Murray A. | 1014 | 1.33 | 3.40 | 2024 | 1 | 1 |
1001 | Safiullin R. | Shelton B. | 1002 | 2.30 | 1.62 | 2024 | 1 | 1 |
1007 | Rune H. | Purcell M. | 1008 | 1.25 | 4.00 | 2024 | 1 | 1 |
And now I need to merge them, so that the resulting table will be like 'result_df', but adding the name of each player (based on the IDs) and also the betting lines for each player. The resulting dataframe should be something like this:
tourney_name | tourney_year | tourney_month | tourney_day | player1_id | player1_name | player2_name | player2_id | B365W | B365L | predicted_probability | predicted_outcome | actual_outcome |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 2024 | 1 | 1 | 1002 | Shelton B. | Safiullin R. | 1001 | 1.62 | 2.30 | 0.028771 | 0 | 0 |
3 | 2024 | 1 | 1 | 1007 | Rune H. | Purcell M. | 1008 | 1.25 | 4.00 | 0.647801 | 1 | 1 |
3 | 2024 | 1 | 1 | 1013 | Dimitrov G. | Murray A. | 1014 | 1.33 | 3.40 | 0.652244 | 1 | 1 |
The IDs in one 'result_df' may be shuffled on the 'betting_lines_24', not always the player1_id equals to the winner_id, it may happen that player1_id is in the loser_id column in the 'betting_lines_24' dataframe. Also the matches in 'result_df' are not in the same order as in 'betting_lines_24'.
How could i do it?
Solution
Code
col1 = ['player1_id', 'player1_name', 'player2_name', 'player2_id', 'B365W', 'B365L', 'tourney_year', 'tourney_month', 'tourney_day']
col2 = ['player2_id', 'player2_name', 'player1_name', 'player1_id', 'B365L', 'B365W', 'tourney_year', 'tourney_month', 'tourney_day']
tmp = pd.concat([betting_lines_24.set_axis(col1, axis=1), betting_lines_24.set_axis(col2, axis=1)])
col3 = ['tourney_name', 'tourney_year', 'tourney_month', 'tourney_day', 'player1_id', 'player1_name',
'player2_name', 'player2_id', 'B365W', 'B365L', 'predicted_probability', 'predicted_outcome',
'actual_outcome']
out = result_df.merge(tmp, how='left')[col3]
out
tourney_name tourney_year tourney_month tourney_day player1_id \
0 3 2024 1 1 1001
1 3 2024 1 1 1007
2 3 2024 1 1 1013
player1_name player2_name player2_id B365W B365L predicted_probability \
0 Safiullin R. Shelton B. 1002 2.30 1.62 0.028771
1 Rune H. Purcell M. 1008 1.25 4.00 0.647801
2 Dimitrov G. Murray A. 1014 1.33 3.40 0.652244
predicted_outcome actual_outcome
0 0 0
1 1 1
2 1 1
plz read notification before ask question : minimal-reproducible-example
Answered By - Panda Kim
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.