Saturday, November 5, 2022

[FIXED] How to get function output to add columns to my Dataframe

November 05, 2022 numpy, pandas, python-3.x No comments

Issue

I have a function that produces an output like so when I pass it a name:

W2V('aamir')

array([ 0.12135 , -0.99132 ,  0.32347 ,  0.31334 ,  0.97446 , -0.67629 ,
        0.88606 , -0.11043 ,  0.79434 ,  1.4788  ,  0.53169 ,  0.95331 ,
       -1.1883  ,  0.82438 , -0.027177,  0.70081 ,  0.87467 , -0.095825,
       -0.5937  ,  1.4262  ,  0.2187  ,  1.1763  ,  1.6294  ,  0.91717 ,
       -0.086697,  0.16529 ,  0.19095 , -0.39362 , -0.40367 ,  0.83966 ,
       -0.25251 ,  0.46286 ,  0.82748 ,  0.93061 ,  1.136   ,  0.85616 ,
        0.34705 ,  0.65946 , -0.7143  ,  0.26379 ,  0.64717 ,  1.5633  ,
       -0.81238 , -0.44516 , -0.2979  ,  0.52601 , -0.41725 ,  0.086686,
        0.68263 , -0.15688 ], dtype=float32)

I have a data frame that has an index Name and a single column Y:

df1

    Y
Name    
aamir   0
aaron   0
... ...
zulema  1
zuzana  1

I wish to run my function on each value of Name and have it create columns like so:

    0   1   2   3   4   5   6   7   8   9   ... 40  41  42  43  44  45  46  47  48  49
Name                                                                                    
aamir   0.12135 -0.99132    0.32347 0.31334 0.97446 -0.67629    0.88606 -0.11043    0.794340    1.47880 ... 0.647170    1.56330 -0.81238    -0.445160   -0.29790    0.52601 -0.41725    0.086686    0.68263 -0.15688
aaron   -1.01850    0.80951 0.40550 0.09801 0.50634 0.22301 -1.06250    -0.17397    -0.061715   0.55292 ... -0.144960   0.82696 -0.51106    -0.072066   0.43069 0.32686 -0.00886    -0.850310   -1.31530    0.71631
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
zulema  0.56547 0.30961 0.48725 1.41000 -0.76790    0.39908 0.86915 0.68361 -0.019467   0.55199 ... 0.062091    0.62614 0.44548 -0.193820   -0.80556    -0.73575    -0.30031    -1.278900   0.24759 -0.55541
zuzana  -1.49480    -0.15111    -0.21853    0.77911 0.44446 0.95019 0.40513 0.26643 0.075182    -1.34340    ... 1.102800    0.51495 1.06230 -1.587600   -0.44667    1.04600 -0.38978    0.741240    0.39457 0.22857

What I have done is real messy, but works:

names = df1.index.to_list()

Lst = []
for name in names:
    Lst.append(W2V(name).tolist())
wv_df = pd.DataFrame(index=names, data=Lst)
wv_df.index.name = "Name"
wv_df.sort_index(inplace=True)

df1 = df1.merge(wv_df, how='inner', left_index=True, right_index=True)

I am hoping there is a way to use .apply() or similar but I have not found how to do this. I am looking for an efficient way.

Update:

I modified my function to do like so:

if isinstance(w, pd.core.series.Series):
        w = w.to_string()

Although this appears to work at first, the data is wrong. If I pass aamir to my function you can see the result. Yet when I do it with apply the numbers are totally different:

df1

    Name    Y
0   aamir   0
1   aaron   0
... ... ...
7942    zulema  1
7943    zuzana  1

df3 = df1.reset_index().drop('Y', axis=1).apply(W2V, axis=1, result_type='expand')

    0   1   2   3   4   5   6   7   8   9   ... 40  41  42  43  44  45  46  47  48  49
0   0.075014    0.824769    0.580976    0.493415    0.409894    0.142214    0.202602    -0.599501   -0.213184   -0.142188   ... 0.627784    0.136511    -0.162938   0.095707    -0.257638   0.396822    0.208624    -0.454204   0.153140    0.803400
1   0.073664    0.868665    0.574581    0.538951    0.394502    0.134773    0.233070    -0.639365   -0.194892   -0.110557   ... 0.722513    0.147112    -0.239356   -0.046832   -0.237434   0.321494    0.206583    -0.454038   0.251605    0.918388
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7942    -0.002117   0.894570    0.834724    0.602266    0.327858    -0.003092   0.197389    -0.675813   -0.311369   -0.174356   ... 0.690172    -0.085517   -0.000235   -0.214937   -0.290900   0.361734    0.290184    -0.497177   0.285071    0.711388
7943    -0.047621   0.850352    0.729225    0.515870    0.439999    0.060711    0.226026    -0.604846   -0.344891   -0.128396   ... 0.557035    -0.048322   -0.070075   -0.265775   -0.330709   0.281492    0.304157    -0.552191   0.281502    0.750304
7944 rows × 50 columns

You can see that the first row is aamir and the first value (column 0) my function returns is 0.1213 (You can see this at the top of my post). Yet with apply that appears to be 0.075014

EDIT:

It appears it passes in Name aamir rather than aamir. How can I get it to just send the Name itself aamir?

Solution

You can try something like this using map and np.vstack with a dataframe constructor then join:

df.join(pd.DataFrame(np.vstack(df.index.map(W2V)), index=df.index))

Output:

   Y  0  1  2  3  4  5  6  7  8  9
A  0  4  0  2  1  0  0  0  0  3  3
B  1  4  0  0  4  4  3  4  3  4  3
C  2  1  5  5  5  3  3  1  3  5  0
D  3  3  5  1  3  4  2  3  1  0  1
E  4  4  0  2  4  4  0  3  3  4  2
F  5  4  3  5  1  0  2  3  2  5  2
G  6  4  5  2  0  0  2  4  3  4  3
H  7  0  2  5  2  3  4  3  5  3  1
I  8  2  2  0  1  4  2  4  1  0  4
J  9  0  2  3  5  0  3  0  2  4  0

Using @Vitalizzare function:

def W2V(name: str) -> np.ndarray:
    low, high, size = 0, 5, 10
    rng = np.random.default_rng(abs(hash(name)))
    return rng.integers(low, high, size, endpoint=True)

df = pd.DataFrame({'Y': np.arange(10)}, index = [*'ABCDEFGHIJ'])

Answered By - Scott Boston

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 5, 2022

[FIXED] How to get function output to add columns to my Dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels