Issue
I have the code below to check for overfitting using R^2. I am trying to use the same code to check for overfitting using RMSE not R^2. The default scorer for .score
is R^2. How can I do that?
cv = KFold(n_splits=5, random_state=0,shuffle=True)
train_scores, test_scores = [], []
for train, test in cv.split(X_normalized):
X_transform2 = poly.fit_transform(X_normalized)
OL = lin_regressor.fit(X_transform2[train], y_for_normalized.iloc[train])
tr_21 = OL.score(X_transform2[train], y_for_normalized.iloc[train])
ts_21 = OL.score(X_transform2[test], y_for_normalized.iloc[test])
print("Train score:", tr_21) # from documentation .score returns r^2
print("Test score:", ts_21) # from documentation .score returns r^2
train_scores.append(tr_21)
test_scores.append(ts_21)
print("The Mean for Train scores is:", (np.mean(train_scores)))
print("The Mean for Test scores is:", (np.mean(test_scores)))
Solution
AFAIK, you can't get other kinds of score from the .score()
method.
You'll need to either make a prediction and score it yourself, e.g. with (the square root of) sklearn.metrics.mean_squared_error()
or by using one of the cross validation workflow helpers like sklearn.model_selection.cross_val_score()
, which you can ask for different scores or use your own scoring function.
For example, in your CV loop you could do something like this:
from sklearn.metrics import mean_squared_error
X_test = X_transform2[test]
y_test = y_for_normalized.iloc[test]
y_pred_ts = OL.predict(X_test)
ts_21_rmse = np.sqrt(mean_squared_error(y_test, y_pred_ts))
You will find the sklearn
User Guide to be an excellent source of information on this and most other basic ML topics.
Answered By - kwinkunks
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.