Leveraging State-of-the-art Deep Learning Advancements for Emotion Detection: A Comprehensive Review and Insights
DOI:
https://doi.org/10.26740/vubeta.v3i2.46974Keywords:
Affective Computing, Emotion Recognition, Human Face, Facial Expression, Data Sets, Deep LearningAbstract
Emotion recognition is a fundamental aspect of affective computing, focusing on identifying and interpreting human emotional states. Among various modalities, facial emotion recognition has gained significant attention due to its non-intrusive nature and extensive applicability across domains such as e-learning, healthcare, marketing, e-commerce, and psychology. A wide range of approaches has been employed to address the challenges inherent in facial emotion classification. There remains a lack of a holistic, structured framework that critically evaluates both the advantages and shortcomings of deep networks while introducing attention-based and Transformer-driven models. Therefore, to address this gap, this paper presents a systematic review of peer-reviewed FER studies of deep learning models published between 2022 and 2025. This paper presents the study of advanced deep learning architectures for facial emotion detection, emphasizing the predominance of Deep Learning models including Transformer-based architectures, hybrid CNN–Transformer models, spatiotemporal learning approaches, and novel attention mechanisms. This research work provides analysis of deep learning model architectures, learning strategies, datasets, evaluation protocols, and performance metrics reported in state-of-the-art FER research. It identifies common issues including computational complexity, real-world robustness, generalization across datasets, and data imbalance. It also analyzes current research challenges, limitations and their practical significance. Furthermore, this research work identified and discussed the possible opportunities, unresolved issues of human facial emotion recognition and provided the future directions. The objective of this study is to provide actionable insights for researchers and practitioners, guiding future research toward more robust, accurate, and interpretable FER systems.
References
[1] K. Scherer and P. Ekman, “Handbook of Methods in Nonverbal Behavior Research,” Cambridge University Press, pp. 45–90, 1982.
[2] P. Ekman and W. V. Friesen, “Constants Across Cultures in the Face and Emotion,” Journal of Personality and Social Psychology, vol. 17, no. 2, pp. 124–129, 1971. https://doi.org/10.1037/h0030377
[3] A. Seyeditabari, N. Tabari, and W. Zadrozny, “Emotion Detection in Text: A Review,” 2018. https://doi.org/10.48550/arxiv.1806.00674
[4] R. A. Calvo and S. M. Kim, “Emotions in Text: Dimensional and Categorical Models,” Computational Intelligence, vol. 29, no. 3, pp. 527–543, 2012. https://doi.org/10.1111/j.1467-8640.2012.00456.x
[5] P. Ekman, “An Argument for Basic Emotions,” Cognition & Emotion, vol. 6, no. 3–4, pp. 169–200, 1992. https://doi.org/10.1080/02699939208411068
[6] P. R. Shaver, J. Schwartz, D. Kirson, and C. O’Connor, “Emotion Knowledge: Further Exploration of a Prototype Approach,” Journal of Personality and Social Psychology, vol. 52, no. 6, pp. 1061–1086, 1987. https://doi.org/10.1037/0022-3514.52.6.1061
[7] K. Oatley and P. N. Johnson‐Laird, “Towards a Cognitive Theory of Emotions,” Cognition & Emotion, vol. 1, no. 1, pp. 29–50, 1987. https://doi.org/10.1080/02699938708408362
[8] R. Plutchik, “A Psychoevolutionary Theory of Emotions,” Social Science Information, vol. 21, no. 4–5, pp. 529–553, 1982. https://doi.org/10.1177/053901882021004003
[9] H. Lövheim, “A New Three-Dimensional Model for Emotions and Monoamine Neurotransmitters,” Medical Hypotheses, vol. 78, no. 2, pp. 341–348, 2012. https://doi.org/10.1016/j.mehy.2011.11.016
[10] P. Ekman, “Basic Emotions,” Handbook of Cognition and Emotion, pp. 45–60, 1999. https://doi.org/10.1002/0470013494.ch3
[11] L. Canales and P. Martínez-Barco, “Emotion Detection from Text: A Survey,” Proceedings of the Workshop on Natural Language Processing in the 5th Information Systems Research Working Days (JISIC), pp. 37–43, 2014. https://doi.org/10.3115/v1/w14-6905
[12] A. Mehrabian, “Silent Messages”, Wadsworth Belmont, vol. 8, no. 152, 1971.
[13] H. C. Triandis and M. Fishbein, “Cognitive Interaction in Person Perception,” Journal of Abnormal & Social Psychology, vol. 67, no. 5, pp. 446–453, 1963, https://doi.org/10.1037/h0038494
[14] P. Ekman, “Methods for Measuring Facial Action,” Handbook of methods in nonverbal Behavior Research, pp. 45–135, 1982.
[15] S. Hossain, S. Umer, R. K. Rout, and M. Tanveer, “Fine-Grained Image Analysis for Facial Expression Recognition using Deep Convolutional Neural Networks with Bilinear Pooling,” Applied Soft Computing, vol. 134, pp. 109997, 2023. https://doi.org/10.1016/j.asoc.2023.109997
[16] D. K. Jain, P. Shamsolmoali, and P. Sehdev, “Extended Deep Neural Network for Facial Emotion Recognition,” Pattern Recognition Letters, vol. 120, pp. 69–74, 2019. https://doi.org/10.1016/j.patrec.2019.01.008
[17] F. D. Luzio, A. Rosato, and M. Panella, “An Explainable Fast Deep Neural Network for Emotion Recognition,” Biomedical Signal Processing and Control, vol. 100, pp. 107177, 2025. https://doi.org/10.1016/j.bspc.2024.107177
[18] M. K. Chowdary, T. N. Nguyen, and D. J. Hemanth, “Deep Learning-based Facial Emotion Recognition for Human–Computer Interaction Applications,” Neural Computing and Applications, vol. 35, no. 32, pp. 23311–23328, 2021. https://doi.org/10.1007/s00521-021-06012-8
[19] L. Lu, L. Yuan, and L. Chen, “Deep Learning Based Emotion Recognition for Analyzing Students’ Psychological States during Competitions,” Entertainment Computing, vol. 55, pp. 101005, 2025. https://doi.org/10.1016/j.entcom.2025.101005
[20] K. Devarajan, P. Suresh, and S. Perumal, “Enhancing Emotion Recognition through Multi-Modal Data Fusion and Graph Neural Networks,” Intelligence-Based Medicine, vol. 12, pp. 100291, 2025. https://doi.org/10.1016/j.ibmed.2025.100291
[21] E. Boitel, A. Mohasseb, and E. Haig, “MIST: Multimodal Emotion Recognition using DeBERTa for Text, Semi-CNN for Speech, ResNet-50 for Facial, and 3D-CNN for Motion Analysis,” Expert Systems with Applications, vol. 270, pp. 126236, 2025. https://doi.org/10.1016/j.eswa.2024.126236
[22] N. K. Chowdhury, M. A. Kabir, A. N. Chy, and Md. J. Siddique, “MMTF-DES: A Fusion of Multimodal Transformer Models for Desire, Emotion, and Sentiment Analysis of Social Media Data,” Neurocomputing, vol. 623, pp. 129376, 2025. https://doi.org/10.1016/j.neucom.2025.129376
[23] S. Woo, M. Zubair, S. Lim, and D. Kim, “Deep Multimodal Emotion Recognition using Modality-Aware Attention and Proxy-based Multimodal Loss,” Internet of Things, vol. 31, pp. 101562, 2025. https://doi.org/10.1016/j.iot.2025.101562
[24] N. Yalçın and M. Alisawi, “Introducing a Novel Dataset for Facial Emotion Recognition and Demonstrating Significant Enhancements in Deep Learning Performance through Pre-Processing Techniques,” Heliyon, vol. 10, no. 20, pp. e38913, 2024. https://doi.org/10.1016/j.heliyon.2024.e38913
[25] G. Vijayaraghavan, T. Mala, D. P, and E. Uma, “Multimodal Emotion Recognition with Deep Learning: Advancements, Challenges, and Future Directions,” Information Fusion, vol. 105, pp. 102218, 2024. https://doi.org/10.1016/j.inffus.2023.102218
[26] S. Zhang, Y. Yang, C. Chen, X. Zhang, Q. Leng, and X. Zhao, “Deep Learning-based Multimodal Emotion Recognition from Audio, Visual, and Text Modalities: A Systematic Review of Recent Advancements and Future Prospects,” Expert Systems with Applications, vol. 237, pp. 121692, 2024. https://doi.org/10.1016/j.eswa.2023.121692
[27] S. Yoonesi, R. Azar, M. Bafrani, S. Yaghmayee, H. Shahavand et al., “Facial Expression Deep Learning Algorithms in the Detection of Neurological Disorders: A Systematic Review and Meta-Analysis,” Biomedical Engineering Online, vol. 24, no. 1, 2025. https://doi.org/10.1186/s12938-025-01396-3
[28] H. V. Manalu and A. P. Rifai, “Detection of Human Emotions Through Facial Expressions using Hybrid Convolutional Neural Network-Recurrent Neural Network Algorithm,” Intelligent Systems with Applications, vol. 21, pp. 200339, 2024. https://doi.org/10.1016/j.iswa.2024.200339
[29] D. Chen, G. Wen, H. Li, P. Yang, C. Chen, and B. Wang, “CDGT: Constructing Diverse Graph Transformers for Emotion Recognition from Facial Videos,” Neural Networks, vol. 179, pp. 106573, 2024. https://doi.org/10.1016/j.neunet.2024.106573
[30] D. Bhagat, A. Vakil, R. K. Gupta, and A. Kumar, “Facial Emotion Recognition (FER) using Convolutional Neural Network (CNN),” Procedia Computer Science, vol. 235, pp. 2079–2089, 2024. https://doi.org/10.1016/j.procs.2024.04.197
[31] U. A. Khan, Q. Xu, Y. Liu, A. Lagstedt, A. Alamäki, and J. Kauttonen, “Exploring Contactless Techniques in Multimodal Emotion Recognition: Insights into Diverse Applications, Challenges, Solutions, and Prospects,” Multimedia Systems, vol. 30, no. 3, 2024. https://doi.org/10.1007/s00530-024-01302-2
[32] M. Jafari, A. Shoeibi, M. Khodatars, S. Bagherzadeh et al., “Emotion Recognition in EEG Signals using Deep Learning Methods: A Review,” Computers in Biology and Medicine, vol. 165, pp. 107450, 2023. https://doi.org/10.1016/j.compbiomed.2023.107450
[33] F. Zhang and L. Chai, “A Review of Research on Micro-Expression Recognition Algorithms Based on Deep Learning,” Neural Computing and Applications, vol. 36, no. 29, pp. 17787–17828, 2024. https://doi.org/10.1007/s00521-024-10262-7
[34] H. Kumar and A. Martín, “Artificial Emotional Intelligence: Conventional and Deep Learning Approach,” Expert Systems with Applications, vol. 212, pp. 118651, 2023. https://doi.org/10.1016/j.eswa.2022.118651
[35] K. Ezzameli and H. Mahersia, “Emotion Recognition from Unimodal to Multimodal Analysis: A Review,” Information Fusion, vol. 99, pp. 101847, 2023. https://doi.org/10.1016/j.inffus.2023.101847
[36] B. Pan, K. Hirota, Z. Jia, and Y. Dai, “A Review of Multimodal Emotion Recognition from Datasets, Preprocessing, Features, and Fusion Methods,” Neurocomputing, vol. 561, pp. 126866, 2023. https://doi.org/10.1016/j.neucom.2023.126866
[37] C. Cheng, W. Liu, Z. Fan, L. Feng, and Z. Jia, “A Novel Transformer Autoencoder for Multi-Modal Emotion Recognition with Incomplete Data,” Neural Networks, vol. 172, pp. 106111, 2024. https://doi.org/10.1016/j.neunet.2024.106111
[38] Y. Li, J. Wei, Y. Liu, J. Kauttonen, and G. Zhao, “Deep Learning for Micro-Expression Recognition: A Survey,” IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2028–2046, 2022. https://doi.org/10.1109/taffc.2022.3205170
[39] X. Ben et al., “Video-based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021. https://doi.org/10.1109/tpami.2021.3067464
[40] H. Ge, Z. Zhu, Y. Dai, B. Wang, and X. Wu, “Facial Expression Recognition Based on Deep Learning,” Computer Methods and Programs in Biomedicine, vol. 215, pp. 106621, 2022. https://doi.org/10.1016/j.cmpb.2022.106621
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Krishna Kant

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Abstract views: 0
,
PDF Downloads: 0





