Early Heart Disease Prediction Using Data Mining Techniques

Authors

  • Dugguh Sylvester Aondonenge Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria
  • Ajayi Ore-Ofe Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria
  • Kamorudeen Hassan Taiwo Department of Family Medicine, Ahmadu Bello University Teaching Hospital, Zaria, Nigeria
  • Abubakar Umar Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria
  • Isa Abdulrazaq Imam Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria
  • Dako Daniel Emmanuel Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria
  • Ibrahim Ibrahim Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

DOI:

https://doi.org/10.26740/vubeta.v2i2.36735

Keywords:

Data Mining, Heart Disease, Machine Learning Algorithms, Model Performance, Predictive Model

Abstract

This study develops a predictive model for early heart disease detection using data mining techniques to enhance timely and accurate diagnosis. Heart disease prediction is complex due to the need to analyze various risk factors, such as age, cholesterol, and blood pressure. The model integrates multiple machines learning algorithms, including Random Forest, Support Vector Machine, and a hybrid ensemble approach, aiming to achieve higher prediction accuracy and reliability. The methodology follows five phases which include data collection, data pre-processing, feature extraction, model construction, and model evaluation. Data was gathered from publicly available health repositories, preprocessed to remove missing values and irrelevant information, and subjected to feature extraction techniques to identify influential predictors. The data was split into an 80:20 ratio for model training and testing to assess model performance across various classification algorithms. The hybrid model achieved an accuracy of 97.56%, precision of 98.04%, and recall of 97.09%, surpassing the individual algorithms tested. These findings indicate that the hybrid approach effectively supports early intervention for heart disease, particularly in healthcare settings with limited diagnostic resources. The study demonstrates that advanced data mining techniques offer a viable solution for improving patient outcomes through early detection of heart disease.

Author Biographies

Dugguh Sylvester Aondonenge, Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Dugguh Sylvester Aondonenge is a data analyst at Federal Inland Revenue Service, in 2017, he obtains a bachelor degree in Computer Science from Federal University Kashere, Gombe State. He further advanced his studies in 2024 where he obtain a Masters degree in Information Technology (MIT) from Ahmadu Bello University, Zaria. His area of interest is Data Analysis and Machine Learning. He can be contacted via email at dugguhsylvester@gmail.com.

Ajayi Ore-Ofe, Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Ajayi Ore-Ofe is a lecturer at the Department of Computer Engineering, Ahmadu Bello University, Zaria, Nigeria. He received his MSc and Ph.D from Computer Engineering in Control Engineering, in 2017 and 2022 respectively. He received his MSc and Ph.D from the department of Computer Engineering in Ahmadu Bello University, Zaria, Nigeria. He is mainly research in control engineering. He can be contacted at email: ajayi.oreofe17@gmail.com

Kamorudeen Hassan Taiwo , Department of Family Medicine, Ahmadu Bello University Teaching Hospital, Zaria, Nigeria

Kamorudeen Hassan Taiwo is a Senior Registrar in the department of Family Medicine at Ahmadu Bello University Teaching Hospital, Zaria, Nigeria. He holds a Bachelor of Medicine, Bachelor of Surgery (M.B.B.S) as well as Master’s degree in Disaster Risk Management and Development Studies (MDRMDS) from the Department of Medicine and Department of Geography in Ahmadu Bello University, Zaria, Nigeria, respectively. Dr. Taiwo is an Associate Fellow of National Postgraduate Medical College of Nigeria (NPMCN) and a Fellow of the Institute of Disaster Management and Safety Science, Nigeria (FDMSS). His primary area of research is in clinical medicine with focus on Artificial Intelligence (AI) applications in healthcare. He can be reached at via email at drtaiwokamar@gmail.com.

Abubakar Umar, Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Abubakar Umar is a lecturer in the Department of Computer Engineering at Ahmadu Bello University, Zaria, Nigeria. He earned his BEng Degree from Electrical Engineering Department Ahmadu Bello University, Zaria, Nigeria, in 2011, MSc, and Ph.D. degrees from Computer Engineering Department, Ahmadu Bello University, Zaria, Nigeria, in 2017 and 2024. He specializes in various aspects of computer engineering. His primary research focus is in Control Engineering, where he explores the development and optimization of control systems for different applications. He is dedicated to advancing his research and contributing to academic knowledge in this field. He can be contacted via email at abuumar@abu.edu.ng, abubakaru061010@gmail.com

Isa Abdulrazaq Imam , Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Dako Daniel Emmanuel , Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Ibrahim Ibrahim , Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

Department of Computer Engineering, Faculty of Engineering, Ahmadu Bello University, Zaria, Nigeria

References

[1] K. Karthick, S. Aruna, R. Samikannu, R. Kuppusamy, Y. Teekaraman, & A. Thelkar, "Implementation of a Heart Disease Risk Prediction Model Using Machine Learning,” Computational and Mathematical Methods in Medicine, vol. 2022, pp. 1-14, 2022. https://doi.org/10.1155/2022/6517716

[2] H. Jindal, S. Agrawal, R. Khera, R. Jain, & P. Nagrath, "Heart Disease Prediction using Machine Learning Algorithms,” IOP Conference Series: Materials Science and Engineering, vol. 1022, no. 1, pp. 12072, 2021. https://doi.org/10.1088/1757-899x/1022/1/012072

[3] J. Mehta, G. Kaur, H. Buttar, H. Bagabir, R. Bagabir, & S. Bagabir, "Role of the renin–angiotensin system in the pathophysiology of coronary heart disease and heart failure: diagnostic biomarkers and therapy with drugs and natural products,” Frontiers in Physiology, vol. 14, 2023. https://doi.org/10.3389/fphys.2023.1034170

[4] C. Razo, C. Welgan, C. Johnson, S. McLaughlin, V. Iannucci, & A. Rodgers, "Effects of elevated systolic blood pressure on ischemic heart disease: a burden of proof study,” Nature Medicine, vol. 28, no. 10, p. 2056-2065, 2022. https://doi.org/10.1038/s41591-022-01974-1

[5] A. Armoundas, S. Narayan, D. Arnett, K. Spector‐Bagdady, D. Bennett, & L. Celi, "Use of Artificial Intelligence in Improving Outcomes in Heart Disease: A Scientific Statement From the American Heart Association,” Circulation, vol. 149, no. 14, 2024. https://doi.org/10.1161/cir.0000000000001201

[6] C. Jurgens, C. Lee, D. Aycock, R. Creber, Q. Denfeld, & H. DeVon, "State of the Science: The Relevance of Symptoms in Cardiovascular Disease and Research: A Scientific Statement From the American Heart Association,” Circulation, vol. 146, no. 12, 2022. https://doi.org/10.1161/cir.0000000000001089

[7] N. Bansal, L. Zelnick, R. Scherzer, M. Estrella, & M. Shlipak, "Risk Factors and Outcomes Associated With Heart Failure With Preserved and Reduced Ejection Fraction in People With Chronic Kidney Disease,” Circulation: Heart Failure, vol. 17, no. 5, 2024. https://doi.org/10.1161/circheartfailure.123.011173

[8] K. Yang and M. Song, "New Insights into the Pathogenesis of Metabolic-Associated Fatty Liver Disease (MAFLD): Gut–Liver–Heart Crosstalk,” Nutrients, vol. 15, no. 18, p. 3970, 2023. https://doi.org/10.3390/nu15183970

[9] R. Sarra, A. Dinar, M. Mohammed, M. Ghani, & M. Albahar, "A Robust Framework for Data Generative and Heart Disease Prediction Based on Efficient Deep Learning Models,” Diagnostics, vol. 12, no. 12, p. 2899, 2022. https://doi.org/10.3390/diagnostics12122899

[10] A. Bhowmick, K. Mahato, C. Azad, & U. Kumar, "Heart Disease Prediction Using Different Machine Learning Algorithms,” 2022 IEEE World Conference on Applied Intelligence and Computing (AIC), 2022. https://doi.org/10.1109/aic55036.2022.9848885

[11] C. Navarro, J. Damen, T. Takada, S. Nijman, P. Dhiman, & J. Ma, "Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review,” BMJ, p. n2281, 2021. https://doi.org/10.1136/bmj.n2281

[12] R. Sarra, A. Dinar, M. Mohammed, & K. Abdulkareem, "Enhanced Heart Disease Prediction Based on Machine Learning and χ2 Statistical Optimal Feature Selection Model,” Designs, vol. 6, no. 5, p. 87, 2022. https://doi.org/10.3390/designs6050087

[13] X. Liu, D. Lü, A. Zhang, Q. Liu, & G. Jiang, "Data-Driven Machine Learning in Environmental Pollution: Gains and Problems,” Environmental Science & Technology, vol. 56, no. 4, p. 2124-2133, 2022. https://doi.org/10.1021/acs.est.1c06157

[14] S. Kutiame, R. Millham, A. Adekoya, M. Tettey, B. Weyori, & P. Appiahene, "Application of Machine Learning Algorithms in Coronary Heart Disease: A Systematic Literature Review and Meta-Analysis,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 6, 2022. https://doi.org/10.14569/ijacsa.2022.0130620

[15] B. Kaur and G. Kaur, "Heart Disease Prediction Using Modified Machine Learning Algorithm,” Lecture Notes in Networks and Systems, p. 189-201, 2022. https://doi.org/10.1007/978-981-19-2821-5_16

[16] R. Rastogi and M. Bansal, "Diabetes prediction model using data mining techniques,” Measurement: Sensors, vol. 25, p. 100605, 2023. https://doi.org/10.1016/j.measen.2022.100605

[17] T.R. Ramesh, U. Lilhore, M. Poongodi, S. Simaiya, A. Kaur, & M. Hamdi, "Predictive Analysis of Heart Diseases With Machine Learning Approaches,” Malaysian Journal of Computer Science, pp. 132-148, 2022. https://doi.org/10.22452/mjcs.sp2022no1.10

[18] E. Onyema, O. Khalaf, C. Tavera, S. Tayeb, S. Ghouali, & G. Abdulsahib, "A Classification Algorithm-Based Hybrid Diabetes Prediction Model,” Frontiers in Public Health, vol. 10, 2022. https://doi.org/10.3389/fpubh.2022.829519

[19] Z. Zhou, "Open-environment machine learning,” National Science Review, vol. 9, no. 8, 2022. https://doi.org/10.1093/nsr/nwac123

[20] A. Kwekha-Rashid, H. Abduljabbar, & B. Alhayani, "Coronavirus disease (covid-19) cases analysis using machine-learning applications,” Applied Nanoscience, vol. 13, no. 3, pp. 2013-2025, 2021. https://doi.org/10.1007/s13204-021-01868-7

[21] W. Li, Y. Chai, F. Khan, S. Jan, S. Verma, & V. Menon, "A Comprehensive Survey on Machine Learning-Based Big Data Analytics for IoT-Enabled Smart Healthcare System,” Mobile Networks and Applications, vol. 26, no. 1, pp. 234-252, 2021. https://doi.org/10.1007/s11036-020-01700-6

[22] M. Pichler and F. Härtig, "Machine learning and deep learning—a review for ecologists,” Methods in Ecology and Evolution, vol. 14, no. 4, pp. 994-1016, 2023. https://doi.org/10.1111/2041-210x.14061

[23] S. Safiri, N. Karamzad, K. Singh, K. Carson‐Chahhoud, C. Adams, & S. Nejadghaderi, "Burden of ischemic heart disease and its attributable risk factors in 204 countries and territories, 1990–2019,” European Journal of Preventive Cardiology, vol. 29, no. 2, pp. 420-431, 2021. https://doi.org/10.1093/eurjpc/zwab213

[24] S. Khan, J. Coresh, M. Pencina, C. Ndumele, J. Rangaswami, & S. Chow, "Novel Prediction Equations for Absolute Risk Assessment of Total Cardiovascular Disease Incorporating Cardiovascular-Kidney-Metabolic Health: A Scientific Statement From the American Heart Association,” Circulation, vol. 148, no. 24, pp. 1982-2004, 2023. https://doi.org/10.1161/cir.0000000000001191

[25] N. Wenger, D. Lloyd‐Jones, M. Elkind, G. Fonarow, J. Warner, & H. Alger, "Call to Action for Cardiovascular Disease in Women: Epidemiology, Awareness, Access, and Delivery of Equitable Health Care: A Presidential Advisory From the American Heart Association,” Circulation, vol. 145, no. 23, 2022. https://doi.org/10.1161/cir.0000000000001071

[26] Y. Zhuang, Y. Wang, P. Sun, J. Ke, & F. Chen, "Association between triglyceride glucose-waist to height ratio and coronary heart disease: a population-based study,” Lipids in Health and Disease, vol. 23, no. 1, 2024. https://doi.org/10.1186/s12944-024-02155-4

[27] M. Cushman, C. Shay, V. Howard, M. Jiménez, J. Lewey, & J. McSweeney, "Ten-Year Differences in Women’s Awareness Related to Coronary Heart Disease: Results of the 2019 American Heart Association National Survey: A Special Report From the American Heart Association,” Circulation, vol. 143, no. 7, 2021. https://doi.org/10.1161/cir.0000000000000907

[28] N. Absar, E. Das, S. Shoma, M. Khandaker, M. Miraz, & M. Faruque, "The Efficacy of Machine-Learning-Supported Smart System for Heart Disease Prediction,” Healthcare, vol. 10, no. 6, p. 1137, 2022. https://doi.org/10.3390/healthcare10061137

[29] H. El-Sofany, B. Bouallègue, & Y. El-Latif, "A proposed technique for predicting heart disease using machine learning algorithms and an explainable AI method,” Scientific Reports, vol. 14, no. 1, 2024. https://doi.org/10.1038/s41598-024-74656-2

[30] A. Ogunpola, F. Saeed, S. Basurra, A. Albarrak, & S. Qasem, "Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases,” Diagnostics, vol. 14, no. 2, p. 144, 2024. https://doi.org/10.3390/diagnostics14020144

[31] S. Mondal, R. Maity, Y. Omo, S. Ghosh, & A. Nag, "An Efficient Computational Risk Prediction Model of Heart Diseases Based on Dual-Stage Stacked Machine Learning Approaches,” IEEE Access, vol. 12, pp. 7255-7270, 2024. https://doi.org/10.1109/access.2024.3350996

[32] P. Kokol, M. Kokol, & S. Zagoranski, "Machine learning on small size samples: a synthetic knowledge synthesis,” Science Progress, vol. 105, no. 1, 2022. https://doi.org/10.1177/00368504211029777

[33] G. Ahmad, H. Fatima, S. Ullah, A. Saidi, & A. Imdadullah, "Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques With and Without GridSearchCV,” IEEE Access, vol. 10, pp. 80151-80173, 2022. https://doi.org/10.1109/access.2022.3165792

[34] A. Ahmad and H. Polat, "Prediction of Heart Disease Based on Machine Learning Using Jellyfish Optimization Algorithm,” Diagnostics, vol. 13, no. 14, p. 2392, 2023. https://doi.org/10.3390/diagnostics13142392

[35] E. ShafieiBavani, B. Goudey, I. Kiral-Kornek, P. Zhong, A. Yepes, & A. Swan, "Predictive models for cochlear implant outcomes: performance, generalizability, and the impact of cohort size,” Trends in Hearing, vol. 25, 2021. https://doi.org/10.1177/23312165211066174

[36] A. Ulloa, L. Jing, J. Pfeifer, S. Raghunath, J. Ruhl, & D. Rocha, "rECHOmmend: An ECG-Based Machine Learning Approach for Identifying Patients at Increased Risk of Undiagnosed Structural Heart Disease Detectable by Echocardiography,” Circulation, vol. 146, no. 1, pp. 36-47, 2022. https://doi.org/10.1161/circulationaha.121.057869

[37] I. Mienye and N. Jere, "Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction,” Information, vol. 15, no. 7, p. 394, 2024. https://doi.org/10.3390/info15070394

[38] S. Babu, P. Ramya, & J. Gracewell, "Revolutionizing heart disease prediction with quantum-enhanced machine learning,” Scientific Reports, vol. 14, no. 1, 2024. https://doi.org/10.1038/s41598-024-55991-w

[39] H. Yang, Z. Chen, Y. Huajian, & M. Tian, "Predicting Coronary Heart Disease Using an Improved LightGBM Model: Performance Analysis and Comparison,” IEEE Access, vol. 11, pp. 23366-23380, 2023. https://doi.org/10.1109/access.2023.3253885

[40] M. Shams, A. Elshewey, E. El-kenawy, A. Ibrahim‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬, F. Talaat, & Z. Tarek, "Water quality prediction using machine learning models based on grid search method,” Multimedia Tools and Applications, vol. 83, no. 12, pp. 35307-35334, 2023. https://doi.org/10.1007/s11042-023-16737-4

[41] K. Okorie-Ufere, P. Regidor, & S. Adeniyi, "Assessment of the Knowledge of Risk Factors Associated with Heart Diseases among Women of Reproductive Age in Nigeria,” International Journal of Nursing, Midwife and Health Related Cases, vol. 10, no. 2, pp. 36-56, 2024. https://doi.org/10.37745/ijnmh.15/vol10n23656

[42] N. Odunaiya, T. Adesanya, E. Okoye, & O. Oguntibeju, "Towards cardiovascular disease prevention in nigeria: a mixed method study of how adolescents and young adults in a university setting perceive cardiovascular disease and risk factors,” African Journal of Primary Health Care & Family Medicine, vol. 13, no. 1, 2021. https://doi.org/10.4102/phcfm.v13i1.2200

[43] N. Odunaiya, O. Adegoke, A. Adeoye, & O. Oguntibeju, "Preliminary study of perceived cardiovascular disease risk and risk status of adults in small rural and urban locations in ibadan, nigeria,” AIMS Public Health, vol. 10, no. 1, pp. 190-208, 2023. https://doi.org/10.3934/publichealth.2023015

[44] G. Wang, B. Wang, & P. Yang, "Epigenetics in Congenital Heart Disease,” Journal of the American Heart Association, vol. 11, no. 7, 2022. https://doi.org/10.1161/jaha.121.025163

[45] C. Ezeude, A. Ezeude, M. Abonyi, M. Nkpozi, C. Ugwueze, & K. Akhidue, "Associations of asymptomatic coronary heart disease in a cohort of stable type 2 diabetic subjects in a tertiary health center in south eastern Nigeria: A cross -sectional study,” International Journal of Scholarly Research in Multidisciplinary Studies, vol. 4, no. 1, pp. 001-012, 2024. https://doi.org/10.56781/ijsrms.2024.4.1.0090

[46] T. Kwan, S. Wong, Y. Hong, A. Kanaya, S. Khan, & L. Hayman, "Epidemiology of Diabetes and Atherosclerotic Cardiovascular Disease Among Asian American Adults: Implications, Management, and Future Directions: A Scientific Statement From the American Heart Association,” Circulation, vol. 148, no. 1, pp. 74-94, 2023. https://doi.org/10.1161/cir.0000000000001145

[47] G. Isola, A. Polizzi, A. Alibrandi, R. Williams, & A. Giudice, "Analysis of galectin‐3 levels as a source of coronary heart disease risk during periodontitis,” Journal of Periodontal Research, vol. 56, no. 3, pp. 597-605, 2021. https://doi.org/10.1111/jre.12860

[48] T. Yang, Y. Liu, L. Li, Y. Zheng, Y. Wang, & J. Su, "Correlation between the triglyceride-to-high-density lipoprotein cholesterol ratio and other unconventional lipid parameters with the risk of prediabetes and type 2 diabetes in patients with coronary heart disease: a rcscd-tcm study in china,” Cardiovascular Diabetology, vol. 21, no. 1, 2022. https://doi.org/10.1186/s12933-022-01531-7

[49] M.S. Sousa, M.L.Q. Mattoso & N.F.F. Ebecken, “Data Mining: A Database Perspective,” WIT Transactions on Information and Communication Technologies, vol. 22, p.19, 2024. http://doi.org/10.2495/DATA980301

[50] D. Shankar, A. Azhakath, N. Khalil, J. Sajeev, T. Mahalakshmi, & K. Sheeba, "Data mining for cyber biosecurity risk management – a comprehensive review,” Computers &Amp; Security, vol. 137, p. 103627, 2024. https://doi.org/10.1016/j.cose.2023.103627

[51] A. Almulihi, H. Saleh, A. Hussien, S. Mostafa, S. El–Sappagh, & K. Alnowaiser, "Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction,” Diagnostics, vol. 12, no. 12, p. 3215, 2022. https://doi.org/10.3390/diagnostics12123215

[52] W. Ng, G. Goh, G. Goh, J. Ten, & W. Yeong, "Progress and Opportunities for Machine Learning in Materials and Processes of Additive Manufacturing,” Advanced Materials, vol. 36, no. 34, 2024. https://doi.org/10.1002/adma.202310006

[53] K. Kumar, V. Rohini, J. Yadla, & J. VNRaju, "A Comparison of Supervised Learning Algorithms to Prediction Heart Disease,” 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), 2023. https://doi.org/10.1109/iceconf57129.2023.10084035

[54] G. James, D. Witten, T. Hastie, R. Tibshirani, & J. Taylor, "Unsupervised Learning,” Springer Texts in Statistics, pp. 503-556, 2023. https://doi.org/10.1007/978-3-031-38747-0_12

[55] S. Sharma, M. Kaur, & S. Gupta, "A Comparison of Machine Learning Approaches for Forecasting Heart Disease with PCA Dimensionality Reduction,” Smart Innovation, Systems and Technologies, pp. 333-347, 2023. https://doi.org/10.1007/978-981-99-3982-4_29

[56] R. Kumar, S. Polepaka, & D. Krishna, "An Insight on Machine Learning Algorithms and its Applications,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 11S2, pp. 432-436, 2019. https://doi.org/10.35940/ijitee.k1069.09811s219

[57] I. Mienye and N. Jere, "Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction,” Information, vol. 15, no. 7, p. 394, 2024. https://doi.org/10.3390/info15070394

Downloads

Published

2025-06-01

How to Cite

[1]
D. Sylvester Aondonenge, “Early Heart Disease Prediction Using Data Mining Techniques”, Vokasi Unesa Bull. Eng. Technol. Appl. Sci., vol. 2, no. 2, pp. 211–226, Jun. 2025.

Issue

Section

Article
Abstract views: 34 , PDF Downloads: 50