Issue
I am reading a big table with multiple columns using parse
command. Then, I would like to use the first and second columns as a nested key pair along with using the rest of columns as a value stored in a list. I have written a code snippet which does what I wish. I was wondering if this operation can be performed more efficiently.
import pandas as pd
#The data frame comes from an Excel sheet using df.parse
df = pd.DataFrame({
"Company": ["TechCorp", "Innovate Inc", "Green Solutions", "Future Dynamics"],
"Product": ["TC100", "IN200", "GS300", "FD400"],
"Production Cost": [10000, 15000, 12000, 18000],
"Development Time": [6, 9, 8, 12],
"Launch Year": [2023, 2024, 2023, 2025]
})
nested_dict = {}
for index, row in df.iterrows():
fleet = row['Company']
engine = row['Product']
values = row[['Production Cost', 'Development Time', 'Launch Year']].tolist()
if fleet not in nested_dict:
nested_dict[fleet] = {}
nested_dict[fleet][engine] = values
return nested_dict
My goal is to get the following structure.
{'TechCorp': {'TC100': [10000, 6, 2023]}, 'Innovate Inc': {'IN200': [15000, 9, 2024]}, 'Green Solutions': {'GS300': [12000, 8, 2023]}, 'Future Dynamics': {'FD400': [18000, 12, 2025]}}
Solution
You could slightly optimise your code by using a defaultdict and deconstructing df.values
:
from collections import defaultdict
nested_dict = defaultdict(dict)
for fleet, engine, *values in df.values:
nested_dict[fleet][engine] = values
I added another engine from TechCorp
to your data to prove the code:
df.loc[4] = ['TechCorp','TC200', 20000, 12, 2025]
Output for my sample data:
{
'TechCorp': {'TC100': [10000, 6, 2023], 'TC200': [20000, 12, 2025]},
'Innovate Inc': {'IN200': [15000, 9, 2024]},
'Green Solutions': {'GS300': [12000, 8, 2023]},
'Future Dynamics': {'FD400': [18000, 12, 2025]}
}
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.