Online Expressions, Offline Struggles: Using Social Media to Identify Depression-Related Symptoms

With their growing popularity, social media platforms have become valuable tools for researchers and health professionals, offering new opportunities to identify linguistic patterns associated with mental health. In this study, we analyze depression-related symptoms using user-generated posts on social media and the Beck Depression Inventory (BDI). Using posts from individuals who have self-reported a depression diagnosis, we train and evaluate sentence classification models to assess their ability to detect BDI symptoms. Specifically, we conduct binary classification experiments to identify the presence of depression-related symptoms and additional tests to categorize sentences into specific BDI symptom types. We also perform a comprehensive symptom-level analysis to examine how depressive symptoms are expressed linguistically, linking social media data with a clinically validated framework. In addition, we analyze symptom distributions between users with and without depression and across platforms, providing insight into how symptoms manifest in diverse online contexts. Furthermore, we incorporate a data augmentation strategy that leverages Large Language Models to generate clinically grounded synthetic examples and evaluate their effectiveness against human-generated data. Our findings indicate that users with depression exhibit a significantly higher prevalence of certain BDI symptoms –particularly Suicidal Thoughts, Crying, Self-Dislike, and Changes in Sleeping Pattern– while control users predominantly express milder categories such as Sadness or Pessimism. Synthetic data improves the detection of underrepresented symptoms and enhances model robustness, although human-generated data better captures subtle linguistic nuances. Specialized models outperform general ones, but specific symptom categories remain challenging, underscoring the need for more interpretable and clinically grounded detection frameworks.

keywords: Depression, Social media, Large Language Models