A Hybrid Clustering–Classification Framework for SMEs Success Level Prediction

Main Article Content

Andika Dermawan Saputra
Wiyli Yustanti

Abstract

Micro, Small, and Medium Enterprises (SMEs) are vital to economic growth, yet their complex success determinants necessitate advanced predictive modeling. This study proposes a hybrid clustering-classification framework to classify and predict SME success levels based on 22 multidimensional indicators, including financial literacy, FinTech adoption, and entrepreneurial resilience. K-Means clustering was first applied to the survey data, yielding three optimal success personas, validated by the highest Silhouette Score (0.5238). These clusters were labeled with Beginner and Conventional, Stable Digital Adopter, and Digital Innovator SMEs. These empirically derived clusters served as pseudo-labels for the classification stage. Classification algorithms were tested with and without the Synthetic Minority Oversampling Technique (SMOTE). While ensemble methods (Random Forest, LightGBM) and SVM performed well, the K-Nearest Neighbors (KNN) algorithm consistently outperformed all others, achieving the highest F1-Score (0.9324) under SMOTE implementation. The findings validate the effectiveness of the hybrid clustering-classification approach in accurately mapping and predicting SME success levels. The resulting model serves as a robust, data-driven tool for policymakers to guide targeted interventions and digital training programs, fostering sustainable SME development.

Article Details

Section
Articles

References

Abbas, J., Zhang, Q., Hussain, I., Akram, S., Afaq, A., & Shad, M. A. (2020). Sustainable innovation in small medium enterprises: The impact of knowledge management on organizational innovation through a mediation analysis by using SEM approach. Sustainability (Switzerland), 12(6). https://doi.org/10.3390/su12062407

Al Koliby, I. S., Noor, N. H. M., Al-Swidi, A. K., Al-Hakimi, M. A., & Mehat, N. A. B. (2025). Enhancing sustainable performance among manufacturing SMEs: the interplay of knowledge management and organizational structure. Discover Sustainability, 6(1). https://doi.org/10.1007/s43621-025-01351-1

Astadi, P., Kristina, S., Retno, S., Yahya, P., & Agni Alam, A. (2022). The long path to achieving green economy performance for micro small medium enterprise. In Journal of Innovation and Entrepreneurship (Vol. 11, Issue 1). https://doi.org/10.1186/s13731-022-00209-4

Bhavna, Verma, R., Handa, R., & Puri, V. (2021). A Hybrid Approach for Diabetes Prediction and Risk Analysis Using Data Mining. Lecture Notes in Electrical Engineering, 668. https://doi.org/10.1007/978-981-15-5341-7_92

Bilal, S. F., Almazroi, A. A., Bashir, S., Khan, F. H., & Almazroi, A. A. (2022). An ensemble based approach using a combination of clustering and classification algorithms to enhance customer churn prediction in telecom industry. PeerJ Computer Science, 8. https://doi.org/10.7717/PEERJ-CS.854

Chenghu, C., & Thammano, A. (2024). A Novel Classification Model Based on Hybrid K-Means and Neural Network for Classification Problems. HighTech and Innovation Journal, 5(3), 716–729. https://doi.org/10.28991/HIJ-2024-05-03-012

Dakhil, A. F., Ali, W. M., & Hasan, M. A. (2024). Applying Hybrid Clustering with Evaluation by AUC Classification Metrics. International Journal of Computing and Digital Systems, 15(1). https://doi.org/10.12785/ijcds/150177

Du, X. (2023). A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy. Entropy, 25(3). https://doi.org/10.3390/e25030510

Fahad, A., Alharthi, K., Tari, Z., Almalawi, A., & Khalil, I. (2014). CluClas: Hybrid clustering-classification approach for accurate and efficient network classification. Proceedings - Conference on Local Computer Networks, LCN. https://doi.org/10.1109/LCN.2014.6925769

Kumar, V. B., Vijayalakshmi, K., & Padmavathamma, M. (2019). A hybrid data mining approach for diabetes prediction and classification. Lecture Notes in Engineering and Computer Science, 2019-October.

Li, L., Lu, Y., Yang, G., & Yan, X. (2024). End-to-End Network Intrusion Detection Based on Contrastive Learning. Sensors, 24(7), 1–21. https://doi.org/10.3390/s24072122

Lisi, S., Mignacca, B., & Grimaldi, M. (2024). Non-financial reporting and SMEs: A systematic review, research agenda, and novel conceptualization. Journal of Management & Organization, 30(3), 600–622. https://doi.org/10.1017/jmo.2023.43

Liu, R., Ali, S., Bilal, S. F., Sakhawat, Z., Imran, A., Almuhaimeed, A., Alzahrani, A., & Sun, G. (2022). An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms. Applied Sciences (Switzerland), 12(18). https://doi.org/10.3390/app12189355

Luo, S., Sun, Y., Yang, F., & Zhou, G. (2022). Does fintech innovation promote enterprise transformation? Evidence from China. Technology in Society, 68. https://doi.org/10.1016/j.techsoc.2021.101821

Morales, P., Flikkema, M., Castaldi, C., & Man, A. P. de. (2022). The effectiveness of appropriation mechanisms for sustainable innovations from small and medium-sized enterprises. Journal of Cleaner Production, 374. https://doi.org/10.1016/j.jclepro.2022.133921

Pei, Y., Liu, J., Zhang, Y., & Qu, Y. (2019). An End-to-end Clustering and Classification Learning Network. Proceedings - 2nd China Symposium on Cognitive Computing and Hybrid Intelligence, CCHI 2019. https://doi.org/10.1109/CCHI.2019.8901950

Putri, A. N. A., Hermawan, P., Mirzanti, I. R., Meadows, M., & Sadraei, R. (2025). Unpacking green growth in SMEs: A framework for dynamic capabilities, value co-creation, and sustainable performance. Sustainable Futures, 10(October 2024), 100840. https://doi.org/10.1016/j.sftr.2025.100840

Qiao, C., & Zhang, W. (2019). Research on classification method of crh maintenance parts based on entropy weight-clustering analysis. Proceedings of 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2019. https://doi.org/10.1109/ITAIC.2019.8785429

Ramdan, M. R., Aziz, N. A. A., Abdullah, N. L., Samsudin, N., Singh, G. S. V., Zakaria, T., Fuzi, N. M., & Ong, S. Y. Y. (2022). SMEs Performance in Malaysia: The Role of Contextual Ambidexterity in Innovation Culture and Performance. Sustainability (Switzerland), 14(3), 1–18. https://doi.org/10.3390/su14031679

Rodríguez-Espíndola, O., Cuevas-Romo, A., Chowdhury, S., Díaz-Acevedo, N., Albores, P., Despoudi, S., Malesios, C., & Dey, P. (2022). The role of circular economy principles and sustainable-oriented innovation to enhance social, economic and environmental performance: Evidence from Mexican SMEs. International Journal of Production Economics, 248. https://doi.org/10.1016/j.ijpe.2022.108495

Samunnisa, K., Kumar, G. S. V., & Madhavi, K. (2023). Intrusion detection system in distributed cloud computing: Hybrid clustering and classification methods. Measurement: Sensors, 25. https://doi.org/10.1016/j.measen.2022.100612

Shang, Q., Yu, Y., & Xie, T. (2022). A Hybrid Method for Traffic State Classification Using K-Medoids Clustering and Self-Tuning Spectral Clustering. Sustainability (Switzerland), 14(17). https://doi.org/10.3390/su141711068

Xiao, J., Tian, Y., Xie, L., Jiang, X., & Huang, J. (2020). A Hybrid Classification Framework Based on Clustering. IEEE Transactions on Industrial Informatics, 16(4). https://doi.org/10.1109/TII.2019.2933675