Issue
Using tabula I read pdf file and got a table:
data = {'country_code': [123, 124, 125, 127, 128, 'city_', 'code', 211, 221, 223, 224, 'store_', 'NA', 'code', 321, 3231, 3213, 32123],
}
what_i_have = pd.DataFrame(data)
what_i_have
There some NaN values(tha't not a problem) and some titles moved down(that is a problem).
I'd like to get something like this: P.S. names of types are not important. I just want to differ codes
Solution
If we need pandas to fix the issue , the idea still be split the word and number into two columns
df.country_code = df.country_code.replace('NA','')
t = pd.to_numeric(df.country_code,errors = 'coerce').isna()
s = df[t].reset_index()
s = s.groupby(s['index'].diff().ne(1).cumsum()).agg({'index':'first','country_code':'sum'}).set_index('index')
df['New'] = s['country_code'].reindex(df.index).ffill().fillna('country_code')
df = df[~t]
country_code New
0 123 country_code
1 124 country_code
2 125 country_code
3 127 country_code
4 128 country_code
7 211 city_code
8 221 city_code
9 223 city_code
10 224 city_code
14 321 store_code
15 3231 store_code
16 3213 store_code
17 32123 store_code
Answered By - BENY
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.