GEORGINA NKOLIKA OBUNADIKE, Emeka Ogbuju, Mukhtar Abubakar


Text classification is a method of grouping a document text into different predefined categories. This method has been applied in different areas such as classification of scientific articles, spam filtering, and classification of document genre. Text classification is a popular task in data mining because of its level of accuracy and easy application. The Internet is a common message transmission medium among many people, billions of messages move around the internet on a daily basis through different platforms on the internet such as e-mail, Facebook, Twitter, etc. Some of these messages are being transmitted with wrong motives, thus it became imperative to design a model for filtering some of these messages using data mining algorithms to sieve away the unwanted messages from circulation. In the light of this, this paper applied three data mining techniques namely: Support Vector Machine (SVM), Naïve Bayes and K-Nearest Neighbour (KNN) to develop models that can be applied to filter messages from Facebook and e-mail to counter circulation of online hate speeches on these platforms. It also compared the performance of these models against collected data to identify the state of the art text classifier. It was observed that the Naïve Bayes algorithm performed better than the other two with an accuracy of 61.5 and ROC of 0.66.

Full Text:



