Issue
I am kinda new to Python but not new to coding. I thought the following coding was going to be easy but I couldn't produce the result that I needed.
I have the data from https://ourworldindata.org/grapher/gdp-per-capita-maddison?tab=chart&country=JPN~USA~GBR~DEU~FRA It has ['Entity', 'Code', 'Year', 'GDP per capita', '417485-annotations']
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
WorldData = pd.read_csv('gdp-per-capita-maddison.csv')
#Stripping the white spaces
WorldData.columns = WorldData.columns.str.strip()
columns=WorldData.columns.tolist()
# Dropping Code of the countries and also Annotations column
WorldData=WorldData.drop([columns[1],columns[-1]], axis=1)
grouped=WorldData.groupby(['Entity'],dropna=True)
print(grouped)
print(grouped.head())
First I can't print grouped. And if I change to print(grouped.head()) I see the whole dataset which is not grouped (See figure 1)
I can create a pivot table in Google sheet with "Entity", therefore, I think there is nothing wrong with the data.
Edit: To clarify: My goal is to be able to create an interactive line plot that plots GDP over the years of 3 selected countries. Therefore, I need (I think I need) to create something like
Solution
As you can see from the output you showed, the .groupby()
method returns an object of type DataFrameGroupBy
, which is not in itself a DataFrame. However, it will yield tuples of (key, df)
when you iterate over it, where key
in your case will be the relevant "Entity" value, and df
will be the grouped DataFrame of entries with that value, as you expect. So, the typical way to use groupby()
here would be something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
WorldData = pd.read_csv('gdp-per-capita-maddison.csv')
# stripping the white spaces
WorldData.columns = WorldData.columns.str.strip()
columns = WorldData.columns.tolist()
# dropping country code and annotations columns
WorldData=WorldData.drop([columns[1], columns[-1]], axis=1)
grouped = WorldData.groupby(['Entity'], dropna=True)
for country, dataframe in grouped:
print("Plotting data for country", country)
plt.plot(dataframe["Year"], dataframe["GDP per capita"], label=country)
# (or however you want to handle the plotting)
Alternatively, DataFrameGroupBy
objects actually have a .plot()
method of their own, so for this use case, you can omit the iteration and simply call
grouped.plot(x="Year", y="GDP per capita", ...)
Answered By - L0tad
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.