Issue
I am using Jupyter notebooks in Visual Studio Code.
After a number of steps analyzing a large dataset, I created a short summary dataframe listing unique data. In VS Code, using the data viewer, I can view the data in the data viewer. In VS Code, using the data viewer, I can export it to CSV.
Questions:
- is there a way to auto-generate from VS Code the Python code required to statically generate the resulting data frame (df2)?
- alternatively, is there a native way in Python or Pandas to do generate this Python code?
Example (simplified):
I start from df1. I derive df2 from it.
import pandas as pd
df1 = pd.DataFrame({'name': ['alice', 'bob', 'frank', 'carole', 'yuchen', 'navid'],
'role': ['dev', 'dev', 'dev', 'qa', 'ux', 'ux'],
'team': ['avengers', 'avengers', 'marvel', 'marvel', 'avengers', 'gotham'],
'country': ['china', 'china', 'US', 'US', 'china', 'US']})
df2 = df1[df1.country=='US']
display(df2)
I can look at df2's content in VS Code data viewer. I can export the data as CSV.
Can I also auto-generate the code required to re-create such a data structure (and associated data). For instance it could be by by clicking somewhere in VS Code (in the data viewer?). Or it could be by calling a Python method on the data frame.
The goal would be to be served with something like the below, which contains both the data and the code needed to recreate the df2 structure:
df2 = pd.DataFrame({'name': ['frank', 'carole', 'navid'],
'role': ['dev','qa', 'ux'],
'team': ['marvel', 'marvel', 'gotham'],
'country': ['US', 'US', 'US']})
Again: this is not about doing a pretty print of a data frame but about obtaining the Python code required to create both the structure and the data.
Thanks!
Solution
I'm guessing you might want to implement something like this:
Here's the code:
import pandas as pd
df1 = pd.DataFrame({'name': ['alice', 'bob', 'frank', 'carole', 'yuchen', 'navid'],
'role': ['dev', 'dev', 'dev', 'qa', 'ux', 'ux'],
'team': ['avengers', 'avengers', 'marvel', 'marvel', 'avengers', 'gotham'],
'country': ['china', 'china', 'US', 'US', 'china', 'US']})
df2 = df1[df1.country=='US']
code = f"df2 = pd.DataFrame({df2.to_dict(orient='list')})"
print(code)
Answered By - JialeDu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.