Issue
In general, we will df.drop('column_name', axis=1)
to remove a column in a DataFrame.
I want to add this transformer into a Pipeline
Example:
numerical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler(with_mean=False))
])
How can I do it?
Solution
You can encapsulate your Pipeline
into a ColumnTransformer
which allows you to select the data that is processed through the pipeline as follows:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_selector, make_column_transformer
col_to_exclude = 'A'
df = pd.DataFrame({'A' : [ 0]*10, 'B' : [ 1]*10, 'C' : [ 2]*10})
numerical_transformer = make_pipeline
SimpleImputer(strategy='mean'),
StandardScaler(with_mean=False)
)
transform = ColumnTransformer(
(numerical_transformer, make_column_selector(pattern=f'^(?!{col_to_exclude})'))
)
transform.fit_transform(df)
NOTE: I am using here a regex pattern to exclude the column A
.
Answered By - Antoine Dubuis
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.