Exploring Supervised Learning Methods for Predicting Cuisines from Their Ingredients
DOI:
https://doi.org/10.26740/vubeta.v2i1.34153Keywords:
Classification, Cuisine Prediction, Supervised Learning, Methods Comparison, Support Vector MachineAbstract
This study explores the use of multi-class classification to predict cuisines based on ingredient list using a Kaggle dataset derived from the Yummly recipe database. The goal was to identify the most effective machine-learning techniques for classifying recipes into different cuisine regions based on their ingredients. Six supervised learning methods were examined: Backpropagation Neural Network, Support Vector Machine (SVM), Naive Bayes, Decision Tree, Random Forest, and AdaBoost. The preprocessing pipeline involvedtokenizing ingredients into numerical features, ensuring compatibility with machine-learning algorithms, and facilitating model training and evaluation. Among the models tested, the SVM and Random Forest algorithms performed the best, achieving accuracies of 76.7% and 73.2%, respectively. These results were relatively close to the top competition leaderboard accuracy of 83%. Our custom implementations oftheBackpropagation Neural Network and Decision Tree demonstrated competitive performance, though hardware limitations during experimentation prevented the full optimization of these models. The findings emphasize the critical role of factors such as parameter tuning, dataset size, and feature preprocessing in determining classification accuracy. Additionally, the study highlights how a combiningof well-selected algorithms and data preprocessing can yield meaningful improvements in prediction quality. All codesand materials used in this research are publicly available, enabling further exploration by other researchers and practitioners
References
[1] H. Kim, S. Choi, and H. H. Shin, “Artificial intelligence in the kitchen: can humans be replaced in recipe creation and food production?,” Int. J. Contemp. Hosp. Manag., 2025, doi: https://doi.org/10.1108/IJCHM-04-2024-0549.
[2] P. M. A. Sadique and R. V Aswiga, “Automatic summarization of cooking videos using transfer learning and transformer-based models,” Discov. Artif. Intell., vol. 5, no. 1, p. 7, 2025, doi: https://doi.org/10.1007/s44163-025-00230-y.
[3] A. D. Starke, J. Dierkes, G. Lied, G. A. B. Kasangu, and C. Trattner, “Supporting healthier food choices through AI-tailored advice: A research agenda,” PEC Innov., p. 100372, 2025, doi: https://doi.org/10.1016/j.biombioe.2025.107620.
[4] O. Awogbemi and D. A. Desai, “Application of computational technologies for transesterification of waste cooking oil into biodiesel,” Biomass and Bioenergy, vol. 194, p. 107620, 2025.
[5] V. Moglia, O. Johnson, G. Cook, M. de Kamps, and L. Smith, “Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review,” BMC Med. Res. Methodol., vol. 25, no. 1, p. 24, 2025, doi: https://doi.org/10.1186/s12874-025-02473-w.
[6] A. Umar, A. Ore-Ofe, I. Ibrahim, A. A. Abiola, and L. A. Olugbenga, “Development of a Head Gesture-Controlled Robot Using an Accelerometer Sensor,” Vokasi Unesa Bull. Eng. Technol. Appl. Sci., pp. 93–102, 2024, doi: https://doi.org/10.26740/vubeta.v1i2.35114.
[7] G. O. Chukwurah et al., “Cultural Influence of Local Food Heritage on Sustainable Development,” World, vol. 6, no. 1, p. 10, 2025, doi: https://doi.org/10.3390/world6010010.
[8] S. Højlund and O. G. Mouritsen, “Sustainable Cuisines and Taste Across Space and Time: Lessons from the Past and Promises for the Future,” Gastronomy, vol. 3, no. 1, p. 1, 2025, doi: https://doi.org/10.3390/gastronomy3010001.
[9] G. El Majdoubi and H. El Ayadi, “Unlocking Potential: A Holistic Approach to Water, Energy, and Food Security Nexus in Tangier-Tetouan-Al Hoceima, Morocco,” Case Stud. Chem. Environ. Eng., p. 101129, 2025, doi: https://doi.org/10.1016/j.cscee.2025.101129.
[10] H. Karg, I. Bellwood-Howard, and N. Ramankutty, “How cities source their food: spatial interactions in West African urban food supply,” Food Secur., pp. 1–22, 2025, doi: https://doi.org/10.1007/s12571-025-01518-8.
[11] G. Ares et al., “Food outlets in Montevideo: Implications for retail food environment research in the majority world,” J. Nutr. Educ. Behav., 2025, doi: https://doi.org/10.1016/j.jneb.2024.12.011.
[12] Z. Z. A. Thaariq, “Internet of Educational Things (IoET): An Overview,” Vokasi Unesa Bull. Eng. Technol. Appl. Sci., vol. 1, no. 2, pp. 81–92, 2024.
[13] X. Ma, H. Ning, X. Xing, and Z. Zang, “Co-pyrolysis behaviors and reaction mechanism analyses of coal slime and moso bamboo based on GC/MS and backpropagation neural network,” Int. J. Hydrogen Energy, vol. 98, pp. 197–210, 2025, doi: https://doi.org/10.1016/j.ijhydene.2024.12.051.
[14] S. K. Mondal et al., “Glacial lakes outburst susceptibility and risk in the Eastern Himalayas using analytical hierarchy process and backpropagation neural network models,” Geomatics, Nat. Hazards Risk, vol. 16, no. 1, p. 2449134, 2025, doi: https://doi.org/10.1080/19475705.2024.2449134.
[15] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, pp. 273–297, 1995, doi: https://doi.org/10.1007/BF00994018.
[16] X. Gao, W. Bai, Q. Dang, S. Yang, and G. Zhang, “Learnable self-supervised support vector machine based individual selection strategy for multimodal multi-objective optimization,” Inf. Sci. (Ny)., vol. 690, p. 121553, 2025, doi: https://doi.org/10.1016/j.ins.2024.121553.
[17] I. Rish, “An empirical study of the naive Bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, vol. 3, no. 22, pp. 41–46.
[18] C. Oddleifson, S. Kilgus, D. A. Klingbeil, A. D. Latham, J. S. Kim, and I. N. Vengurlekar, “Using a naive Bayesian approach to identify academic risk based on multiple sources: A conceptual replication,” J. Sch. Psychol., vol. 108, p. 101397, 2025, doi: https://doi.org/10.1016/j.jsp.2024.101397.
[19] T. Sharma, U. Upadhyay, and G. Bagler, “Classification of cuisines from sequentially structured recipes,” in 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), 2020, pp. 105–108, doi: https://doi.org/10.1109/ICDEW49219.2020.00008.
[20] L. Hu, M. Jiang, X. Liu, and Z. He, “Significance-based decision tree for interpretable categorical data clustering,” Inf. Sci. (Ny)., vol. 690, p. 121588, 2025, doi: https://doi.org/10.1016/j.ins.2024.121588.
[21] B. Mallala, A. I. U. Ahmed, S. V Pamidi, M. O. Faruque, and R. Reddy, “Forecasting global sustainable energy from renewable sources using random forest algorithm,” Results Eng., vol. 25, p. 103789, 2025, doi: https://doi.org/10.1016/j.rineng.2024.103789.
[22] S. Palaniappan, R. Logeswaran, A. Velayutham, and B. N. Dung, “Predicting short-range weather in tropical regions using random forest classifier,” J. Informatics Web Eng., vol. 4, no. 1, pp. 18–28, 2025, doi: https://doi.org/10.33093/jiwe.2025.4.1.2.
[23] Z. Yu, Q. Kanwal, M. Wang, A. Nurdiawati, and S. G. Al-Ghamdi, “Spatiotemporal dynamics and key drivers of carbon emissions in regional construction sectors: Insights from a Random Forest Model,” Clean. Environ. Syst., vol. 16, p. 100257, 2025, doi: https://doi.org/10.1016/j.cesys.2025.100257.
[24] W. Tao, Z. Sun, Z. Yang, B. Liang, G. Wang, and S. Xiao, “Transformer fault diagnosis technology based on AdaBoost enhanced transferred convolutional neural network,” Expert Syst. Appl., vol. 264, p. 125972, 2025, doi: https://doi.org/10.1016/j.eswa.2024.125972.
[25] C.-Y. Shih, Y.-T. Lin, W. Chen, and J.-C. Huang, “SVM-Adaboost based badminton offensive movement parsing technique,” Signal, Image Video Process., vol. 19, no. 4, p. 280, 2025, doi: https://doi.org/10.1007/s11760-025-03865-7.
[26] H. Attou, A. Guezzaz, S. Benkirane, and M. Azrour, “A New Secure Model for Cloud Environments Using RBFNN and AdaBoost,” SN Comput. Sci., vol. 6, no. 2, p. 188, 2025, doi: https://doi.org/10.1007/s42979-025-03691-1.
[27] D. Batra et al., “RecipeDB: a resource for exploring recipes,” Database, vol. 2020, p. baaa077, 2020, doi: https://doi.org/10.1093/database/baaa077.
[28] P. Piplani, P. Gulati, S. Malik, S. Goyal, M. Gurbaxani, and G. Bagler, “FoodPrint: computing carbon footprint of recipes,” in 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), 2022, pp. 95–100, doi: https://doi.org/10.1109/ICDEW55742.2022.00020.
[29] M. Goel et al., “Ratatouille: a tool for novel recipe generation,” in 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), 2022, pp. 107–110, doi: https://doi.org/10.1109/ICDEW55742.2022.00022.
[30] L. V Fausett, Fundamentals of neural networks: architectures, algorithms and applications. Pearson Education India, 2006.
[31] D. Meyer, F. Leisch, and K. Hornik, “The support vector machine under test,” Neurocomputing, vol. 55, no. 1–2, pp. 169–186, 2003, doi: https://doi.org/10.1016/S0925-2312(03)00431-4.
[32] P. Domingos and M. Pazzani, “On the optimality of the simple Bayesian classifier under zero-one loss,” Mach. Learn., vol. 29, pp. 103–130, 1997, doi: https://doi.org/10.1023/A:1007413511361.
[33] K. K. Rana, “A survey on decision tree algorithm for classification,” Int. J. Eng. Dev. Res., vol. 2, no. 1, pp. 1–5, 2014.
[34] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001, doi: https://doi.org/10.1023/A:1010933404324.
[35] J. Zhu, H. Zou, S. Rosset, and T. Hastie, “Multi-class adaboost,” Stat. Interface, vol. 2, no. 3, pp. 349–360, 2009.
[36] T. F. Chan, G. H. Golub, and R. J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances,” in COMPSTAT 1982 5th Symposium held at Toulouse 1982: Part I: Proceedings in Computational Statistics, 1982, pp. 30–41, doi: https://doi.org/10.1007/978-3-642-51461-6_3.
[37] H. Schütze, C. D. Manning, and P. Raghavan, Introduction to information retrieval, vol. 39. Cambridge University Press Cambridge, 2008.
[38] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: https://doi.org/10.1038/nature14539.
[39] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, 2017, doi: 10.1145/3065386.
[40] M. H. Nguyen and F. de la Torre, “Optimal feature selection for support vector machines,” Pattern Recognit., 2010, doi: 10.1016/j.patcog.2009.09.003.
[41] W. H. Chen, J. Y. Shih, and S. Wu, “Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets,” Int. J. Electron. Financ., 2006, doi: 10.1504/IJEF.2006.008837.
[42] S. R. Sain and V. N. Vapnik, “The Nature of Statistical Learning Theory,” Technometrics, 1996, doi: 10.2307/1271324.
[43] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” 2006, doi: 10.1145/1143844.1143865.
[44] N. Ketkar and J. Moolayil, Deep Learning with Python. 2021.
[45] Y. Tang and R. Salakhutdinov, “Learning stochastic feedforward neural networks,” 2013.
[46] C. Robert, “ Machine Learning, a Probabilistic Perspective ,” CHANCE, 2014, doi: 10.¬1080¬/09¬332¬480.2014.914768.
[47] J. W. Osborne, “Improving your data transformations: Applying the Box-Cox transformation,” Pract. Assessment, Res. Eval., 2010.
[48] J. D. M. Rennie, L. Shih, J. Teevan, and D. Karger, “Tackling the Poor Assumptions of Naive Bayes Text Classifiers,” 2003.
[49] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI/ICML-98 Work. Learn. Text Categ., 1998, doi: 10.1.1.46.1529.
[50] L. E. Raileanu and K. Stoffel, “Theoretical comparison between the Gini Index and Information Gain criteria,” Ann. Math. Artif. Intell., 2004, doi: 10.1023/B:AMAI.0000018580.96245.c6.
[51] Y. Goldberg, “A primer on neural network models for natural language processing,” J. Artif. Intell. Res., 2016, doi: 10.1613/jair.4992.
[52] T. K. Ho, “Random decision forests,” 1995, doi: 10.1109/ICDAR.1995.598994.
[53] A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R News, 2002.
[54] Thomas G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning. 2000.
[55] R. Meir and G. Rätsch, “An introduction to boosting and leveraging,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 2003, doi: 10.1007/3-540-36434-x_4.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Yonathan Ferry Hendrawan, Omkar Chekuri

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Abstract views: 476
,
PDF Downloads: 312





