Issue
I was wondering if there was any way to classify text data into different groups/categories based on the words in the text using a combination of Python and Sklearn Machine Learning?
For example:
text = [["request approval for access", "request approval to enter premises", "Laptop not working"], ["completed bw table loading"]]
So can I get categories like:
category_label = [[0,0,2], [1]]
categories = [["approval request", "approval request", "Laptop working"], ["bw table"]]
where
0 = approval request
2 = laptop working
1 = bw table
Basically the above would imply that there is no labelled training data or target labels.
Solution
This is readily possible in Scikit-Learn as well as in NLTK.
The features that you list:
0 = approval request
2 = laptop working
1 = bw table
are not ones that a clustering algorithm would naturally choose and its worthwhile to caution you against the possible mistake of clouding your statistical learning algorithm with heuristics. I suggest that you first try some clustering and classification and then consider semi-supervised learning methods whereby you can label your clusters and propagate those labels.
Answered By - AN6U5
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.