Evaluation of the Effectiveness of SMOTE and Random Under Sampling in Emotion Classification of Tweets

Evaluasi Efektivitas SMOTE dan Random Under Sampling pada Klasifikasi Emosi Tweet

  • I Komang Dharmendra ITB STIKOM Bali
  • I Made Agus Wirahadi Putra ITB STIKOM Bali
  • Yohanes Priyo Atmojo ITB STIKOM Bali

Abstract

This study evaluates the effectiveness of two sampling techniques, SMOTE (Synthetic Minority Over-sampling Technique) and Random Under Sampling (RUS), in improving the performance of several classification models, namely Maximum Entropy, SVM, Random Forest, Neural Network, and Naive Bayes Classification, for handling data imbalance in emotion classification of tweets. The analysis results show that SMOTE consistently provides a more significant improvement in accuracy, precision, recall, and F1-score compared to RUS, especially in Random Forest and Neural Network models. Maximum Entropy and SVM prove to be the best-performing models in both scenarios, while Naive Bayes Classification, although efficient in terms of time, shows lower performance in evaluation metrics. Overall, SMOTE is a more effective sampling technique compared to RUS in handling class imbalance.

References

[1] F. A. Acheampong, C. Wenyu, and H. Nunoo-Mensah, ‘Text-based emotion detection: Advances, challenges, and opportunities’, Eng. Rep., vol. 2, no. 7, p. e12189, 2020, doi: 10.1002/eng2.12189.
[2] M. C. Noviardini, A. B. Osmond, and C. Setianingsih, ‘Klasifikasi Emosi Pada Lirik Lagu Menggunakan Metode NaÏve Bayes Classifier’, EProceedings Eng., vol. 5, no. 3, Art. no. 3, Dec. 2018, Accessed: Dec. 04, 2022. [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/8004
[3] K. Sailunaz and R. Alhajj, ‘Emotion and sentiment analysis from Twitter text’, J. Comput. Sci., vol. 36, p. 101003, Sep. 2019, doi: 10.1016/j.jocs.2019.05.009.
[4] B. S. Raghuwanshi and S. Shukla, ‘SMOTE based class-specific extreme learning machine for imbalanced learning’, Knowl.-Based Syst., vol. 187, p. 104814, Jan. 2020, doi: 10.1016/j.knosys.2019.06.022.
[5] T. Pan, J. Zhao, W. Wu, and J. Yang, ‘Learning imbalanced datasets based on SMOTE and Gaussian distribution’, Inf. Sci., vol. 512, pp. 1214–1233, Feb. 2020, doi: 10.1016/j.ins.2019.10.048.
[6] N. A. Azhar, M. S. Mohd Pozi, A. Mohamed Din, and A. Jatowt, ‘An Investigation of SMOTE based Methods for Imbalanced Datasets with Data Complexity Analysis’, IEEE Trans. Knowl. Data Eng., pp. 1–1, 2022, doi: 10.1109/TKDE.2022.3179381.
[7] A. Khurana and O. P. Verma, ‘Optimal Feature Selection for Imbalanced Text Classification’, IEEE Trans. Artif. Intell., vol. 4, no. 1, pp. 135–147, Feb. 2023, doi: 10.1109/TAI.2022.3144651.
[8] Asniar, N. U. Maulidevi, and K. Surendro, ‘SMOTE-LOF for noise identification in imbalanced data classification’, J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 6, pp. 3413–3423, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.014.
[9] S. Shaikh, S. M. Daudpota, A. S. Imran, and Z. Kastrati, ‘Towards Improved Classification Accuracy on Highly Imbalanced Text Dataset Using Deep Neural Language Models’, Appl. Sci., vol. 11, no. 2, p. 869, Jan. 2021, doi: 10.3390/app11020869.
[10] J. M. Johnson and T. M. Khoshgoftaar, ‘The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data’, Inf. Syst. Front., vol. 22, no. 5, pp. 1113–1131, Oct. 2020, doi: 10.1007/s10796-020-10022-7.
[11] J. Zhao, J. Jin, S. Chen, R. Zhang, B. Yu, and Q. Liu, ‘A weighted hybrid ensemble method for classifying imbalanced data’, Knowl.-Based Syst., vol. 203, p. 106087, Sep. 2020, doi: 10.1016/j.knosys.2020.106087.
[12] M. A. Tocoglu, O. Ozturkmenoglu, and A. Alpkocak, ‘Emotion Analysis From Turkish Tweets Using Deep Neural Networks’, IEEE Access, vol. 7, pp. 183061–183069, 2019, doi: 10.1109/ACCESS.2019.2960113.
[13] Naufal Hilmiaji, Kemas Muslim Lhaksmana, and Mahendra Dwifebri Purbolaksono, ‘Identifying Emotion on Indonesian Tweets using Convolutional Neural Networks’, J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 3, pp. 584–593, Jun. 2021, doi: 10.29207/resti.v5i3.3137.
[14] M. S. Saputri, R. Mahendra, and M. Adriani, ‘Emotion Classification on Indonesian Twitter Dataset’, in 2018 International Conference on Asian Language Processing (IALP), Nov. 2018, pp. 90–95. doi: 10.1109/IALP.2018.8629262.
Published
2024-12-23
How to Cite
DHARMENDRA, I Komang; PUTRA, I Made Agus Wirahadi; ATMOJO, Yohanes Priyo. Evaluation of the Effectiveness of SMOTE and Random Under Sampling in Emotion Classification of Tweets. INFORMATICS FOR EDUCATORS AND PROFESSIONAL : Journal of Informatics, [S.l.], v. 9, n. 2, p. 182 - 193, dec. 2024. ISSN 2548-3412. Available at: <https://ejournal-binainsani.ac.id/index.php/ITBI/article/view/3183>. Date accessed: 15 jan. 2025. doi: https://doi.org/10.51211/itbi.v9i2.3183.