Issue
I'm following Nicholas' Renotte's tutorial on building a Python ML app with Streamlit. In his utils.py file, there is a def fit
function where a SimpleImputer model is fitted to the data in the train.csv file:
def fit(self, X, y=None):
self.ageImputer = SimpleImputer()
self.ageImputer.fit(X[['Age']])
return self
However, the .fit
function takes in what appears to be only one input.
In the scikit docs, .fit
takes in two datasets, x and y:
classif.fit(X, y)
The same is visible in another article I read:
my_linear_regressor.fit(X_train, y_train)
I see that in Nicholas' def fit function, y=None. I also see that 'Age' is just a single column in the training data.
- Is the .fit method flexible to take in either 1 or 2 datasets?
- Is
X
a 2D array? - What is the meaning of the
[['Age']]
syntax?
Solution
For your first bulleted question and the title question, the answer is "yes, depending on the estimator". To start, from https://scikit-learn.org/stable/developers/develop.html#fitting:
The
fit()
method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.
That is, expect to pass in y
if the estimator you're using needs the target variable(s).
Now you'll generally see y
given as an optional parameter (with e.g. the default value of None
) in sklearn unsupervised transformers; this is for compatibility with Pipelines. From https://scikit-learn.org/stable/developers/develop.html#pipeline-compatibility:
All
fit
andfit_transform
functions must take argumentsX, y
, even if y is not used.
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.