Issue
guys. I am yet a beginner trying to learn ML so do forgive me for such a simple question. I had a dataset from UCI ML Repository. So, started applying all kinds of unsupervised algorithm in which i also applied K Means Cluster algorithm. When I printed out the accuracy score it was negative, not just once but many times. As far as I know scores aren't negative. So could you please help me as to why it's negative.
Any help is appreciated.
import pandas as pd
import numpy as np
a = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', names = ["a", "b", "c", "d","e","f","g","h","i"])
b = a
c = b.filter(a.columns[[8]], axis=1)
a.drop(a.columns[[8]], axis=1, inplace=True)
from sklearn.preprocessing import LabelEncoder
le1 = LabelEncoder()
le1.fit(a.a)
a.a = le1.transform(a.a)
from sklearn.preprocessing import OneHotEncoder
x = np.array(a)
y = np.array(c)
ohe = OneHotEncoder(categorical_features=[0])
ohe.fit(x)
x = ohe.transform(x).toarray()
from sklearn.model_selection import train_test_split
xtr, xts, ytr, yts = train_test_split(x,y,test_size=0.2)
from sklearn import cluster
kmean = cluster.KMeans(n_clusters=2, init='k-means++', max_iter=100, n_init=10)
kmean.fit(xtr,ytr)
print(kmean.score(xts,yts))
Thank you!!
Solution
Clustering is not classification.
Note that the 'y' argument of fit is ignored. Kmeans will always predict 0,1,...,k-1. So it will never make a correct label on this data set, because it doesn't even know what a label is supposed to look like. It really doesn't work to transfer what you did in classification to clustering. You need to relearn this from scratch. Different workflow, different evaluation.
Answered By - Has QUIT--Anony-Mousse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.