THE EFFECT OF OVERSAMPLING TECHNIQUES ON MACHINE LEARNING ALGORITHM IN BODY MASS INDEX (BMI) CLASSIFICATION
Main Article Content
Abstract
BMI is the basic of people’s weight classification that can indicate serious diseases such as obesity. Many researches have been published about BMI classification using machine learning algorithms. Some techniques are used to increase the accuracy of the model, one of them is oversampling as a technique to handle imbalance data. The goal of this research is to compare the effect of either the existence and inexistence of oversampling in KNN, random forest, and SVM. The dataset that is used in this research is a real BMI classification data including gender, height, weight, and BMI index. The methods of this research are data pre-processing, data exploration, training and testing model, model’s evaluation, tuning hyperparameter, and also identify feature importance. The results of data exploration show that weight is the variable which has the strongest correlation with BMI index of 0.8 and there’s also no multicollinearity. Model’s evaluation using confusion matrix based on F1-score shows that the best model is the SVM model without oversampling after tuning hyperparameter with F1-score of more than 0.95. Feature importance’s identification using PFI methods on the best model shows that weight is the most impactful variable in BMI classification.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.