Comparative Evaluation of Deep Learning and Transformer-Based Models for Depression Detection Using Clinical Interview Transcripts

Abstract:

This paper presents a systematic comparative evaluation of classical deep learning architectures—Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and Bidirectional Gated Recurrent Unit (BiGRU)—against a transformer-based model (DistilBERT + Logistic Regression) for automated depression detection from clinical interview transcripts. Experiments are conducted on the Distress Analysis Interview Corpus–Wizard of Oz (DAIC-WOZ) dataset under controlled, standardized conditions to ensure fair comparison. Results demonstrate that DistilBERT + Logistic Regression outperforms all deep learning baselines (F1 = 0.9506), while classical models remain competitive and computationally cheaper. Statistical significance testing (McNemar's test) confirms that the transformer advantage is not attributable to chance (p < 0.05). The paper discusses trade-offs between predictive accuracy and computational efficiency, ethical considerations, and implications for deploying AI-based decision-support in digital mental health information systems.