Issue
I trained an sklearn sklearn.ensemble.RandomForestRegressor
and would like to rename the input features to the model. I tried doing:
model.feature_names_in_
= new feature names, but this doesn't work as I get:
AttributeError: can't set attribute
So is it just not possible?
EDIT: After upgrading, this works:
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
import pandas as pd
X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)
regr = RandomForestRegressor(max_depth=2, random_state=0)
X = pd.DataFrame(X)
X.columns = ["a", "b", "c", "d"]
regr.fit(X, y)
regr.feature_names_in_
array(["a", "b", "c", "d"], dtype=object)
regr.feature_names_in_ = ["aaa", "bbb", "ccc", "ddd"]
regr.feature_names_in_
['aaa', 'bbb', 'ccc', 'ddd']
Solution
Update: As per the comments, setting the feature_names_in_
attribute directly works. The problem was related to an older version of sklearn
.
My original answer:
You could wrap the estimator in a custom one, like this:
import numpy as np
from sklearn.base import clone, BaseEstimator, TransformerMixin
#Class for renaming input features
class InputFeaturesRenamer(BaseEstimator, MetaEstimatorMixin):
def __init__(self, estimator, renamed_features_dict):
self.estimator = estimator
self.renamed_features_dict = renamed_features_dict
def _rename(self, X):
X_renamed = X.copy()
if hasattr(X, 'columns'):
renamed_features = [
self.renamed_features_dict[original_name]
for original_name in X.columns
]
X_renamed.columns = renamed_features
return X_renamed
def fit(self, X, y=None):
self.feature_names_in_ = X.columns.to_numpy()
X_renamed = self._rename(X)
#pass renamed X onto estimator
self.estimator_ = clone(self.estimator).fit(X_renamed, y=y)
def predict(self, X):
X_renamed = self._rename(X)
return self.estimator_.predict(X_renamed)
def predict_proba(self, X):
X_renamed = self._rename(X)
return self.estimator_.predict_proba(X_renamed)
Test case:
import pandas as pd
data = pd.DataFrame({'feat0': [0,0,0,1], 'y': [2.1,2.2,2.2, 10]})
rf = RandomForestRegressor()
rf_renamed = InputFeaturesRenamer(rf, {'feat0': 'renamed_feat0'})
rf_renamed.fit(data[['feat0']], data.y)
print('Original names:', rf_renamed.feature_names_in_,
'Renamed to:', rf_renamed.estimator_.feature_names_in_)
Output:
Original names: ['feat0'] Renamed to: ['renamed_feat0']
Renaming here only applies where the input is a dataframe. It feels like a complicated solution for a simple task, so I'd also be interested in seeing if there's a better way.
Answered By - user3128
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.