Issue
If I create a Pipeline
in sklearn where the first step is a transformation (Imputer
) and the second step is fitting a RandomForestClassifier with the keyword argument warmstart
marked as True
, how do I successively call the RandomForestClassifier? Does warmstart
do anything when embedded in a `Pipeline?
http://scikit-learn.org/0.18/auto_examples/missing_values.html
Solution
Yes it can, but then the pipeline parts become slightly complex.
You see warm_start
is only useful if you increase the n_estimators
in the RandomForestClassifier
.
See here:-
warn("Warm-start fitting without increasing n_estimators does not fit new trees.")
So you will need to increase the n_estimators
of the RandomForestClassifier
inside the pipeline.
For that you will first need to access the RandomForestClassifier
estimator from the pipeline and then set the n_estimators
as required. But then when you call fit()
on pipeline, the imputer
step will still get executed (which just repeats each time).
For example, consider the below pipeline:
pipe = Pipeline([('imputer', Imputer()),
('clf', RandomForestClassifier(warm_start=True))])
Now according to your question, you will need to do this to use the warm_start
:-
# Fit the data initially
pipe.fit(X, y)
# Change the n_estimators (any one line from given two)
pipe.set_params(clf__n_estimators=30)
OR
pipe.named_steps['clf'].n_estimators = 30
# Fit the same data again or new data
pipe.fit(X_new, y_new)
In the first call to pipe.fit()
, the imputer will be fitted on given data (X, y). Now in the second call to fit()
, two things may happen based on the data:
- If you give same data again, then the imputer will still be fitted again, which is not needed.
- If the data is different, the imputer will be fitted on the new data and forget the previously learnt information. So the imputing of missing values in this new data will be different from how it handled the previous data. This is not what you want in my opinion.
Answered By - Vivek Kumar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.