PENGARUH TEKNIK OVERSAMPLING PADA ALGORITMA MACHINE LEARNING DALAM KLASIFIKASI BODY MASS INDEX (BMI)

Isnayni Feby Hawari; Mohamad Khoirun Najib; Sri Nurdiati; Yosef Felix Ygga Marpaung; Nindi Kusumawati; Meyliana Nurfadila; Kathleen Rabika Sijabat; Banissa Fathimatuzzahra Hernawan

PDF

Published: Apr 30, 2024

Isnayni Feby Hawari

IPB University

Mohamad Khoirun Najib

IPB University

https://orcid.org/0000-0002-4372-4661

Sri Nurdiati

IPB University

https://orcid.org/0000-0001-9571-7060

Yosef Felix Ygga Marpaung

IPB University

Nindi Kusumawati

IPB University

Meyliana Nurfadila

IPB University

Kathleen Rabika Sijabat

IPB University

Banissa Fathimatuzzahra Hernawan

IPB University

Abstract

BMI is the basic of people’s weight classification that can indicate serious diseases such as obesity. Many researches have been published about BMI classification using machine learning algorithms. Some techniques are used to increase the accuracy of the model, one of them is oversampling as a technique to handle imbalance data. The goal of this research is to compare the effect of either the existence and inexistence of oversampling in KNN, random forest, and SVM. The dataset that is used in this research is a real BMI classification data including gender, height, weight, and BMI index. The methods of this research are data pre-processing, data exploration, training and testing model, model’s evaluation, tuning hyperparameter, and also identify feature importance. The results of data exploration show that weight is the variable which has the strongest correlation with BMI index of 0.8 and there’s also no multicollinearity. Model’s evaluation using confusion matrix based on F1-score shows that the best model is the SVM model without oversampling after tuning hyperparameter with F1-score of more than 0.95. Feature importance’s identification using PFI methods on the best model shows that weight is the most impactful variable in BMI classification.

Issue

Vol. 8 No. 1 (2024): April, JRAM

Section

Algebra

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details