THE EFFECT OF OVERSAMPLING TECHNIQUES ON MACHINE LEARNING ALGORITHM IN BODY MASS INDEX (BMI) CLASSIFICATION

Main Article Content

Isnayni Feby Hawari
Mohamad Khoirun Najib
Sri Nurdiati
Yosef Felix Ygga Marpaung
Nindi Kusumawati
Meyliana Nurfadila
Kathleen Rabika Sijabat
Banissa Fathimatuzzahra Hernawan

Abstract

BMI is the basic of people’s weight classification that can indicate serious diseases such as obesity. Many researches have been published about BMI classification using machine learning algorithms. Some techniques are used to increase the accuracy of the model, one of them is oversampling as a technique to handle imbalance data. The goal of this research is to compare the effect of either the existence and inexistence of oversampling in KNN, random forest, and SVM. The dataset that is used in this research is a real BMI classification data including gender, height, weight, and BMI index. The methods of this research are data pre-processing, data exploration, training and testing model, model’s evaluation, tuning hyperparameter, and also identify feature importance. The results of data exploration show that weight is the variable which has the strongest correlation with BMI index of 0.8 and there’s also no multicollinearity. Model’s evaluation using confusion matrix based on F1-score shows that the best model is the SVM model without oversampling after tuning hyperparameter with F1-score of more than 0.95. Feature importance’s identification using PFI methods on the best model shows that weight is the most impactful variable in BMI classification.

Article Details

Section
Algebra