Enhancing Clickbait Headline Identification Performance Without Preprocessing Through Feature Reduction and Sentiment Analysis
Keywords:
Clickbait Detection, Feature Reduction, Sentiment Analysis, Embedding Technique, Machine LearningAbstract
This study addresses the challenge of identifying clickbait headlines without relying on conventional text preprocessing, which can be resource-intensive and may degrade contextual integrity. To enhance detection performance, we examine three feature extraction methods: TF-IDF, Word2Vec, and Headline2Vec, an embedding technique designed for short texts like headlines. These features are optimized using feature selection algorithms, including Pearson Correlation Coefficient (PCC), Neighborhood Component Analysis (NCA), and Relief, to reduce dimensionality and enhance relevant signal retention. Sentiment polarity is also integrated as a complementary feature. A comparative evaluation is conducted using several machine learning classifiers, namely Support Vector Classifier (SVC), Random Forest, LightGBM, and XGBoost, across all combinations of feature extraction and selection methods. Results show that the optimal configuration Headline2Vec with Relief and SVC achieves the highest accuracy at 94.40%, outperforming other approaches. This demonstrates the effectiveness of combining semantic vectorization and feature selection for clickbait detection in the absence of traditional preprocessing. The findings support the development of streamlined and scalable classification models capable of maintaining high accuracy while reducing preprocessing overhead, making the proposed method particularly suitable for real-time and large-scale content moderation and news verification systems.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Moch Deny Pratama, Anisa Nur Azizah, Misbachul Falach Asy'ari, Dimas Novian Aditia Syahputra, M Adamu Islam Mashuri, Binti Kholifah, Rifqi Abdillah, Adinda Putri Pratiwi, Dina Zatusiva Haq

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Abstract views: 78
,
PDF Downloads: 49


