Issue
I have the following abridged dataframe:
df1 = pd.DataFrame({'end': [2007, 2013, 2014, 2013, 2014], 'id.thomas'\
: ['136', '136', '136', '172', '172'], 'years_exp': ['14', '20', '21', \
'14', '15']}, index=[2,3,4,5,6])
end id.thomas years_exp
2 2007 136 14
3 2013 136 20
4 2014 136 21
5 2013 172 14
6 2014 172 15
where end
is representative of years. I would like to expand the end
and years_exp
column to account account for the missing years:
end id.thomas years_exp
2 2007 136 14
3 2008 136 15
4 2009 136 16
5 2010 136 17
6 2011 136 18
7 2012 136 19
8 2013 136 20
9 2014 136 21
10 2013 172 14
11 2014 172 15
I have been working on this for about 20 hours, trying to 'engineer' a fix. Does anyone know of a simple Python/Pandas tool/method for accomplishing this task?
Solution
This takes the first end
and years_exp
fields for a given id.thomas
, and then enumerates these forward to the final year.
final_year = 2014
>>> pd.DataFrame([(year, id_, n)
for id_, end, years_exp in df1.groupby('id.thomas').first().itertuples()
for n, year in enumerate(range(end, final_year + 1), years_exp)],
columns=['end', 'id.thomas', 'years_exp'])
end id.thomas years_exp
0 2007 136 14
1 2008 136 15
2 2009 136 16
3 2010 136 17
4 2011 136 18
5 2012 136 19
6 2013 136 20
7 2014 136 21
8 2013 172 14
9 2014 172 15
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.