Thursday, March 17, 2022

[FIXED] Plot DataFrame before and after outlier removal

March 17, 2022 matplotlib, python, seaborn No comments

Issue

linking this solved question (again thx alot @mozway and jezrael!) Remove outlier with Python

I would like to plot the outlier removal. What I want: A scatter plot consisting of 7 subplots made with all rows (x-axes should be the time from the first row, the other rows should be the y-axes, respectively). The removed values should be highlighted. How can I do this?

I thought about plotting before and after the removal and insert both into a single plot.

I have two approaches to plot:

Ni60 = data[['60Ni']]
Ni61 = data[['61Ni']]
Ni62 = data[['62Ni']]
Cu63 = data[['63Cu']]
Ni64 = data[['64Ni']]
Cu65 = data[['65Cu']]
Zn66 = data[['66Zn']]
Time = data[['Time']]

fig, ax = plt.subplots(2, 2, sharex = True, figsize = (13, 8))
plt.rcParams['figure.dpi'] = 100 
fig.suptitle(basename)

ax[0, 0].scatter(Time, Ni60)
ax[0, 1].scatter(Time, Ni61)
ax[1, 0].scatter(Time, Cu63)
ax[1, 1].scatter(Time, Cu65)

for axis in ax.flat: axis.set_xlim(0, 32)
ax[0, 0].set_ylim(0, 0.02)
ax[1, 0].set_ylim(0, 0.02)
ax[0, 1].set_ylim(0, 0.002)
ax[1, 1].set_ylim(0, 0.02)


ax[0, 0].set_xlabel('Time (s)')
ax[0, 1].set_xlabel('Time (s)')
ax[1, 0].set_xlabel('Time (s)')
ax[1, 1].set_xlabel('Time (s)')
ax[0, 0].set_ylabel('Spannung (V)')
ax[0, 1].set_ylabel('Spannung (V)')
ax[1, 0].set_ylabel('Spannung (V)')
ax[1, 1].set_ylabel('Spannung (V)')

ax[0, 0].set_title('$^{60}$Ni', color = 'b')
ax[0, 1].set_title('$^{61}$Ni', color = 'b')
ax[1, 0].set_title('$^{63}$Cu', color = 'b')
ax[1, 1].set_title('$^{65}$Cu', color = 'b')

fig.savefig(outfile + "blank.png", dpi=150)

and

f = sns.relplot(data=data.melt(id_vars='Time', value_name='Spannung (V)'), x='Time', y='Spannung (V)', col='variable', col_wrap=2, kind='line', marker='o')
f.fig.savefig("out.png")

But these will generate either before and/or after the outlier removal. I would like to plot the data in blue and the outlier in red.

The outlier is removed by:

from scipy import stats
cols = list(df.drop(columns='Time').columns)
# or
# cols = ['60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn']

df[cols] = df[cols].where(np.abs(stats.zscore(df[cols])) < 2)

Solution

Although not the best method logically, if you draw in red with the data before the outlier exclusion and then draw the outlier in blue, the outlier will not be overwritten and will remain red.

from scipy import stats
cols = list(df.drop(columns='Time').columns)
dfo = pd.DataFrame({'Time':df['Time']})
dfo[cols] = df[cols].mask(np.abs(stats.zscore(df[cols])) >= 2)

import matplotlib.pyplot as plt

cols = ['60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn']

fig, axes = plt.subplots(2, 2, sharex=True, figsize=(12, 8), dpi=100)
fig.suptitle('Outlier graph')

for idx, (c,ax) in enumerate(zip(cols, axes.flatten())):
    ax.scatter(df['Time'], df[c], color='r')
    ax.scatter(dfo['Time'], dfo[c], color='b')    
    ax.set(xlim=(0,32), ylim=(0,0.01))
    ax.set_xlabel('Time (s)')
    ax.set_ylabel('Spannung (V)')
    ax.set_title('${}${}'.format(c[:2], c[2:]), color='b')

#fig.savefig(outfile + "blank.png", dpi=150)
plt.show()

To read in a data file and create a graph

for c in cols:
    data = pd.read_csv(c+'.csv', ...)
    for ax in axes.flatten():
    ...
    fig.savefig(c+'_blank.png',dpi=150)

Answered By - r-beginners

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, March 17, 2022

[FIXED] Plot DataFrame before and after outlier removal

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels