Classification of Natural Disaster Reports from Social Media using K-Means SMOTE and Multinomial Naïve Bayes
Abstract
Disasters can occur anytime and anywhere. Floods and forest fires are two types of disasters that occur in Indonesia. South Kalimantan Province is an area that frequently experiences floods and forest fires. The dataset used for previous research's flood and forest fire disaster data is unbalanced. Unbalanced data conditions can complicate the classification method in carrying out the classification process. The sampling method for the data level approach that can be used to solve imbalance problems is oversampling, one of the derivatives of oversampling, namely SMOTE. The K-Means SMOTE method is a modification of SMOTE. One Naïve Bayes model often used in text classification is Multinomial Naïve Bayes. Multinomial Naïve Bayes has a good performance in classifying text. The research results on flood disaster data using K-Means SMOTE with Multinomial Naïve Bayes yielded an f1 score of 66.04%, and forest fire disaster data using K-Means SMOTE with Multinomial Naïve Bayes produced an f1 score of 66.31%.