We developed a document classifier web service in which a document is uploaded and is classified using naive bayes classifier.
We used 20news_group dataset for our classification.
- Dataset was combined into one file.
- Stopwords were removed.
- Lemmatization was performed on the datasets.
- Probabilies of each word against each category was calculated and stored in the database.
- Classification was performed based on the probabilities(naive bayes theorem)