Uncovering Hidden Issues in Audit Findings Through LDA-Based Topic Modeling

Main Article Content

Yoyok Prastyo
Wiyli Yustanti
Yuni Yamasari

Abstract

Academic audit reports play an important role in assessing and monitoring the quality of higher education. However, most of these reports are arranged in an unstructured narrative descriptive form, making it difficult to analyze systematically and consistently, especially if done manually. This poses a challenge for auditors and decision makers in identifying patterns of findings and quality issues efficiently. This study aims to apply and evaluate the Latent Dirichlet Allocation (LDA) method in extracting keywords and abstracting main topics from academic audit report texts. The dataset was obtained from the Quality Management System (SIMUTU) of Surabaya State University, which includes hundreds of audit finding descriptions from various faculties over the past three years. The methodology used includes text preprocessing stages using tokenization, stopword removal, and stemming techniques, followed by topic modeling using LDA. Evaluation was carried out quantitatively using a coherence score to assess topic quality, and qualitatively through visualization of results in the form of word clouds and pyLDAvis. The results showed that the LDA model was able to produce meaningful, representative, and relevant topics in the context of academic quality, such as document management, lecturer involvement, and implementation of learning evaluations. Manual validation by internal quality experts showed that the generated topics can help in understanding audit findings trends more quickly and objectively. Thus, LDA has proven to be effective as an approach to extracting important information from unstructured audit reports and has great potential to be integrated into data-driven quality dashboard systems to support more informed and evidence-based decision making.

Article Details

Section
Articles

References

[1] R. Silaen and T. Dewayanto, “Penggunaan berbagai artificial intelligence pada proses audit: A systematic literature review,” Diponegoro J. Account., vol. 13, no. 2, pp. 112–125, 2024. [Online]. Available: https://ejournal3.undip.ac.id/index.php/accounting/article/view/43916

[2] H. H. Rumahorbo and T. Dewayanto, “Pengaruh Transformasi Digital: Kecerdasan Buatan Dan Internet Of Things Terhadap Peran Dan Praktik Audit Internal: Systematic Literature Review,” Diponegoro J. Account., vol. 12, no. 4, pp. 1–15, 2023. [Online]. Available: http://ejournal-s1.undip.ac.id/index.php/accounting

[3] D. Yu and B. Xiang, “Discovering topics and trends in the field of Artificial Intelligence: Using LDA topic modeling,” Expert Syst. Appl., vol. 225, 2023. [Online]. Available: https://doi.org/10.1016/j.eswa.2023.120114

[4] L. Zheng, Z. He, and S. He, “A topic model-based knowledge graph to detect product defects from social media data,” Expert Syst. Appl., vol. 268, 2025. [Online]. Available: https://doi.org/10.1016/j.eswa.2024.126313

[5] V. S. Patil and V. B. Shinde, “The impact of an effective academic audit on accreditation performance,” Qual. Assur. Educ., vol. 33, no. 1, pp. 14–27, 2025. [Online]. Available: https://doi.org/10.1108/QAE-12-2024-0275

[6] M. L. C. Chilmi, “Latent Dirichlet Allocation (LDA) untuk Mengetahui Topik Pembicaraan Publik tentang Omnibus Law,” Jurnal Informatika, vol. 15, no. 1, pp. 45–58, 2021. [Online]. Available: https://repository.uinjkt.ac.id/dspace/bitstream/123456789/56724/1/M.%20LUVIAN%20CHISNI%20CHILMI-FST.pdf

[7] Y. S. Wardhana and A. Kesumawati, “Analisis Topik Skripsi Menerapkan Pemodelan Latent Dirichlet Allocation,” J. Teknol. Inform. Sistem Informasi, vol. 4, no. 2, pp. 98–110, 2023. [Online]. Available: https://ojs.stmik-banjarbaru.ac.id/index.php/jutisi/article/download/2271/1196

[8] R. Gautam and M. Sharma, “Improving SVM performance for type II diabetes prediction with an integrated kernel function,” Materials Today: Proceedings, vol. 66, pp. 1727–1731, 2023.

[9] W. Yustanti, A. W. Utami, G. S. Palupi and P. S. Nautika, "Probabilistic-based Text Clustering for Optimizing Mental Health Issues Extraction on Social Media," 2024 Seventh International Conference on Vocational Education and Electrical Engineering (ICVEE), Malang, Indonesia, 2024, pp. 70-75, doi: 10.1109/ICVEE63912.2024.10823692.

[10] Yamasari, Y., Qoiriah, A., Rochmawati, N., Prapanca, A., Prihanto, A., Suartana, I. M., & Ahmad, T. (2024). Exploring the tree algorithms to generate the optimal detection system of students’ stress levels. Indonesian Journal of Electrical Engineering and Computer Science, 36(1), 548–558. https://doi.org/10.11591/ijeecs.v36.i1.pp548-558

[11] Buditjahjanto, I. G. P. A., Idhom, M., Munoto, M., & Samani, M. (2022). An Automated Essay Scoring Based on Neural Networks to Predict and Classify Competence of Examinees in Community Academy. TEM Journal, 11(4), 1694–1701. https://doi.org/10.18421/TEM114-34

[12] R. E. Putra and I. Made Suartana, "Development of Smart and Interactive Laboratory Management System (SI-LMS)," 2021 Fourth International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia, 2021, pp. 1-5, doi: 10.1109/ICVEE54186.2021.9649702.

[13] E. Yohannes et al., "Educational Training on Business Management Using Web-Based Applications with Cryptocurrency Integration," 2024 Seventh International Conference on Vocational Education and Electrical Engineering (ICVEE), Malang, Indonesia, 2024, pp. 157-162, doi: 10.1109/ICVEE63912.2024.10823784.