Exploring Supervised Learning Methods for Predicting Cuisines from Their Ingredients

Yonathan Ferry Hendrawan; Omkar Chekuri

doi:10.26740/vubeta.v2i1.34153

Authors

Yonathan Ferry Hendrawan Universitas Trunojoyo Madura
Omkar Chekuri University of Oklahoma

DOI:

https://doi.org/10.26740/vubeta.v2i1.34153

Keywords:

Classification, Cuisine Prediction, Supervised Learning, Methods Comparison, Support Vector Machine

Abstract

Various regions around the world use similar ingredients for food preparation, with exceptions of unique regional ingredients. However, the variation in the cuisines in the regions stems from the unique combinations of these ingredients. This aspect has been explored in Kaggle's competition, in which many submissions and solutions have been put forward. However, to the best of our knowledge, there is still no paper that compares Backpropagation, Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, and AdaBoost to predict cuisines based on their ingredients. We present our approach and measurement of those Supervised Learning Methods for tackling the problem. We use a combination of Machine Learning library and our own method implementations to conduct the experiment. Our results show that all the methods have more than 55% accuracy, and the best result achieved is 76.769% for Support Vector Machine. Given the small data size and high dimensionality of text data, SVM and Naive Bayes generalize well, compared to the more complex methods such as Neural Network. Our results also suggest that Random forest is robust and handles noise in the data well compared to AdaBoost.

Author Biographies

Yonathan Ferry Hendrawan, Universitas Trunojoyo Madura

He is a lecturer in the Informatics Department, Engineering Faculty, Trunojoyo University, Bangkalan, Indonesia. He received his Electrical Engineering Bachelor Degree from Sepuluh Nopember Institute of Technology (ITS), Surabaya in 2004. He received his Information Technology Master's Degree from the University of New South Wales (UNSW), Sydney in 2012. He is currently a PhD candidate at the University of Oklahoma. His research interest is in Information Visualization, Computer Graphics, and Artificial Intelligence.

Omkar Chekuri, University of Oklahoma

He is currently a PhD candidate at University of Oklahoma, where he received M.S in data science and analytics in 2018. His technical interests include information visualization, machine learning, software engineering, direct manipulation user interfaces and data pipeline architectures for information visualizations. He can be contacted at email: omkar.chekuri@ou.edu

References

[1] H. Kim, S. Choi, and H. H. Shin, “Artificial intelligence in the kitchen: can humans be replaced in recipe creation and food production?,” Int. J. Contemp. Hosp. Manag., 2025, doi: https://doi.org/10.1108/IJCHM-04-2024-0549.

[2] P. M. A. Sadique and R. V Aswiga, “Automatic summarization of cooking videos using transfer learning and transformer-based models,” Discov. Artif. Intell., vol. 5, no. 1, p. 7, 2025, doi: https://doi.org/10.1007/s44163-025-00230-y.

[3] A. D. Starke, J. Dierkes, G. Lied, G. A. B. Kasangu, and C. Trattner, “Supporting healthier food choices through AI-tailored advice: A research agenda,” PEC Innov., p. 100372, 2025, doi: https://doi.org/10.1016/j.biombioe.2025.107620.

[4] O. Awogbemi and D. A. Desai, “Application of computational technologies for transesterification of waste cooking oil into biodiesel,” Biomass and Bioenergy, vol. 194, p. 107620, 2025.

[5] V. Moglia, O. Johnson, G. Cook, M. de Kamps, and L. Smith, “Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review,” BMC Med. Res. Methodol., vol. 25, no. 1, p. 24, 2025, doi: https://doi.org/10.1186/s12874-025-02473-w.

[6] A. Umar, A. Ore-Ofe, I. Ibrahim, A. A. Abiola, and L. A. Olugbenga, “Development of a Head Gesture-Controlled Robot Using an Accelerometer Sensor,” Vokasi Unesa Bull. Eng. Technol. Appl. Sci., pp. 93–102, 2024, doi: https://doi.org/10.26740/vubeta.v1i2.35114.

[7] G. O. Chukwurah et al., “Cultural Influence of Local Food Heritage on Sustainable Development,” World, vol. 6, no. 1, p. 10, 2025, doi: https://doi.org/10.3390/world6010010.

[8] S. Højlund and O. G. Mouritsen, “Sustainable Cuisines and Taste Across Space and Time: Lessons from the Past and Promises for the Future,” Gastronomy, vol. 3, no. 1, p. 1, 2025, doi: https://doi.org/10.3390/gastronomy3010001.

[9] G. El Majdoubi and H. El Ayadi, “Unlocking Potential: A Holistic Approach to Water, Energy, and Food Security Nexus in Tangier-Tetouan-Al Hoceima, Morocco,” Case Stud. Chem. Environ. Eng., p. 101129, 2025, doi: https://doi.org/10.1016/j.cscee.2025.101129.

[10] H. Karg, I. Bellwood-Howard, and N. Ramankutty, “How cities source their food: spatial interactions in West African urban food supply,” Food Secur., pp. 1–22, 2025, doi: https://doi.org/10.1007/s12571-025-01518-8.

[11] G. Ares et al., “Food outlets in Montevideo: Implications for retail food environment research in the majority world,” J. Nutr. Educ. Behav., 2025, doi: https://doi.org/10.1016/j.jneb.2024.12.011.

[12] Z. Z. A. Thaariq, “Internet of Educational Things (IoET): An Overview,” Vokasi Unesa Bull. Eng. Technol. Appl. Sci., vol. 1, no. 2, pp. 81–92, 2024.

[13] X. Ma, H. Ning, X. Xing, and Z. Zang, “Co-pyrolysis behaviors and reaction mechanism analyses of coal slime and moso bamboo based on GC/MS and backpropagation neural network,” Int. J. Hydrogen Energy, vol. 98, pp. 197–210, 2025, doi: https://doi.org/10.1016/j.ijhydene.2024.12.051.

[14] S. K. Mondal et al., “Glacial lakes outburst susceptibility and risk in the Eastern Himalayas using analytical hierarchy process and backpropagation neural network models,” Geomatics, Nat. Hazards Risk, vol. 16, no. 1, p. 2449134, 2025, doi: https://doi.org/10.1080/19475705.2024.2449134.

[15] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, pp. 273–297, 1995, doi: https://doi.org/10.1007/BF00994018.

[16] X. Gao, W. Bai, Q. Dang, S. Yang, and G. Zhang, “Learnable self-supervised support vector machine based individual selection strategy for multimodal multi-objective optimization,” Inf. Sci. (Ny)., vol. 690, p. 121553, 2025, doi: https://doi.org/10.1016/j.ins.2024.121553.

[17] I. Rish, “An empirical study of the naive Bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, vol. 3, no. 22, pp. 41–46.

[18] C. Oddleifson, S. Kilgus, D. A. Klingbeil, A. D. Latham, J. S. Kim, and I. N. Vengurlekar, “Using a naive Bayesian approach to identify academic risk based on multiple sources: A conceptual replication,” J. Sch. Psychol., vol. 108, p. 101397, 2025, doi: https://doi.org/10.1016/j.jsp.2024.101397.

[19] T. Sharma, U. Upadhyay, and G. Bagler, “Classification of cuisines from sequentially structured recipes,” in 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), 2020, pp. 105–108, doi: https://doi.org/10.1109/ICDEW49219.2020.00008.

[20] L. Hu, M. Jiang, X. Liu, and Z. He, “Significance-based decision tree for interpretable categorical data clustering,” Inf. Sci. (Ny)., vol. 690, p. 121588, 2025, doi: https://doi.org/10.1016/j.ins.2024.121588.

[21] B. Mallala, A. I. U. Ahmed, S. V Pamidi, M. O. Faruque, and R. Reddy, “Forecasting global sustainable energy from renewable sources using random forest algorithm,” Results Eng., vol. 25, p. 103789, 2025, doi: https://doi.org/10.1016/j.rineng.2024.103789.

[22] S. Palaniappan, R. Logeswaran, A. Velayutham, and B. N. Dung, “Predicting short-range weather in tropical regions using random forest classifier,” J. Informatics Web Eng., vol. 4, no. 1, pp. 18–28, 2025, doi: https://doi.org/10.33093/jiwe.2025.4.1.2.

[23] Z. Yu, Q. Kanwal, M. Wang, A. Nurdiawati, and S. G. Al-Ghamdi, “Spatiotemporal dynamics and key drivers of carbon emissions in regional construction sectors: Insights from a Random Forest Model,” Clean. Environ. Syst., vol. 16, p. 100257, 2025, doi: https://doi.org/10.1016/j.cesys.2025.100257.

[24] W. Tao, Z. Sun, Z. Yang, B. Liang, G. Wang, and S. Xiao, “Transformer fault diagnosis technology based on AdaBoost enhanced transferred convolutional neural network,” Expert Syst. Appl., vol. 264, p. 125972, 2025, doi: https://doi.org/10.1016/j.eswa.2024.125972.

[25] C.-Y. Shih, Y.-T. Lin, W. Chen, and J.-C. Huang, “SVM-Adaboost based badminton offensive movement parsing technique,” Signal, Image Video Process., vol. 19, no. 4, p. 280, 2025, doi: https://doi.org/10.1007/s11760-025-03865-7.

[26] H. Attou, A. Guezzaz, S. Benkirane, and M. Azrour, “A New Secure Model for Cloud Environments Using RBFNN and AdaBoost,” SN Comput. Sci., vol. 6, no. 2, p. 188, 2025, doi: https://doi.org/10.1007/s42979-025-03691-1.

[27] D. Batra et al., “RecipeDB: a resource for exploring recipes,” Database, vol. 2020, p. baaa077, 2020, doi: https://doi.org/10.1093/database/baaa077.

[28] P. Piplani, P. Gulati, S. Malik, S. Goyal, M. Gurbaxani, and G. Bagler, “FoodPrint: computing carbon footprint of recipes,” in 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), 2022, pp. 95–100, doi: https://doi.org/10.1109/ICDEW55742.2022.00020.

[29] M. Goel et al., “Ratatouille: a tool for novel recipe generation,” in 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), 2022, pp. 107–110, doi: https://doi.org/10.1109/ICDEW55742.2022.00022.

[30] L. V Fausett, Fundamentals of neural networks: architectures, algorithms and applications. Pearson Education India, 2006.

[31] D. Meyer, F. Leisch, and K. Hornik, “The support vector machine under test,” Neurocomputing, vol. 55, no. 1–2, pp. 169–186, 2003, doi: https://doi.org/10.1016/S0925-2312(03)00431-4.

[32] P. Domingos and M. Pazzani, “On the optimality of the simple Bayesian classifier under zero-one loss,” Mach. Learn., vol. 29, pp. 103–130, 1997, doi: https://doi.org/10.1023/A:1007413511361.

[33] K. K. Rana, “A survey on decision tree algorithm for classification,” Int. J. Eng. Dev. Res., vol. 2, no. 1, pp. 1–5, 2014.

[34] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001, doi: https://doi.org/10.1023/A:1010933404324.

[35] J. Zhu, H. Zou, S. Rosset, and T. Hastie, “Multi-class adaboost,” Stat. Interface, vol. 2, no. 3, pp. 349–360, 2009.

[36] T. F. Chan, G. H. Golub, and R. J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances,” in COMPSTAT 1982 5th Symposium held at Toulouse 1982: Part I: Proceedings in Computational Statistics, 1982, pp. 30–41, doi: https://doi.org/10.1007/978-3-642-51461-6_3.

[37] H. Schütze, C. D. Manning, and P. Raghavan, Introduction to information retrieval, vol. 39. Cambridge University Press Cambridge, 2008.

[38] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: https://doi.org/10.1038/nature14539.

[39] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, 2017, doi: 10.1145/3065386.

[40] M. H. Nguyen and F. de la Torre, “Optimal feature selection for support vector machines,” Pattern Recognit., 2010, doi: 10.1016/j.patcog.2009.09.003.

[41] W. H. Chen, J. Y. Shih, and S. Wu, “Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets,” Int. J. Electron. Financ., 2006, doi: 10.1504/IJEF.2006.008837.

[42] S. R. Sain and V. N. Vapnik, “The Nature of Statistical Learning Theory,” Technometrics, 1996, doi: 10.2307/1271324.

[43] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” 2006, doi: 10.1145/1143844.1143865.

[44] N. Ketkar and J. Moolayil, Deep Learning with Python. 2021.

[45] Y. Tang and R. Salakhutdinov, “Learning stochastic feedforward neural networks,” 2013.

[46] C. Robert, “ Machine Learning, a Probabilistic Perspective ,” CHANCE, 2014, doi: 10.¬1080¬/09¬332¬480.2014.914768.

[47] J. W. Osborne, “Improving your data transformations: Applying the Box-Cox transformation,” Pract. Assessment, Res. Eval., 2010.

[48] J. D. M. Rennie, L. Shih, J. Teevan, and D. Karger, “Tackling the Poor Assumptions of Naive Bayes Text Classifiers,” 2003.

[49] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI/ICML-98 Work. Learn. Text Categ., 1998, doi: 10.1.1.46.1529.

[50] L. E. Raileanu and K. Stoffel, “Theoretical comparison between the Gini Index and Information Gain criteria,” Ann. Math. Artif. Intell., 2004, doi: 10.1023/B:AMAI.0000018580.96245.c6.

[51] Y. Goldberg, “A primer on neural network models for natural language processing,” J. Artif. Intell. Res., 2016, doi: 10.1613/jair.4992.

[52] T. K. Ho, “Random decision forests,” 1995, doi: 10.1109/ICDAR.1995.598994.

[53] A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R News, 2002.

[54] Thomas G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning. 2000.

[55] R. Meir and G. Rätsch, “An introduction to boosting and leveraging,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 2003, doi: 10.1007/3-540-36434-x_4.