The 3rd IEEE BigData Challenge Cup: Explainable Suicide Risk Detection on Social Media

Jun Li, Xiangmeng Wang, Haoyang Li, Yifei Yan, Hong Va Leong, and Qing Li

The Hong Kong Polytechnic University, City University of Hong Kong

hialex.li@connect.polyu.edu.hk, xiangmengpoly.wang@polyu.edu.hk, haoyang-comp.li@polyu.edu.hk, yfyan8-c@my.cityu.edu.hk, cshleong@comp.polyu.edu.hk, qing-prof.li@polyu.edu.hk

Abstract

Suicide is one of the leading causes of death around the world, and more people are now sharing their feelings and struggles on social media. This gives researchers a new way to detect suicide risk early. While large language models have shown good results in finding suicidal content online, it is still hard to use these models in real clinical settings. One key reason is that these models do not align with how clinical risk assessment works in practice. Clinicians do not only rely on what a person says. They look for specific signs of risk, such as past self-harm or access to dangerous means, and signs of protection, such as family support or coping skills. Most current models only produce a risk prediction without showing the evidence behind their decisions, which makes it hard for clinicians to trust or apply them. Building on two previous shared tasks at IEEE BigData 2024 and 2025, the 2026 competition introduces a dual-objective formulation that requires participants to predict user-level suicide risk and provide structured clinical evidence drawn from users’ post histories. We will release a dataset with labels for risk levels, risk and protective factors, and key text expressions to support model training and evaluation.

Index Terms—Suicide Risk Detection, Explainable AI, Social Media, Data Mining, Big Data

I. Overview

This proposal presents the 2026 IEEE BigData Cup Challenge on suicide risk assessment. Building on our 2024 and 2025 editions, this year’s challenge extends prior suicide risk detection tasks by introducing clinical interpretability as a core objective. While the previous two editions focused on post-level and user-level risk detection, respectively, the 2026 challenge emphasizes not only prediction accuracy but also transparent clinical reasoning behind model outputs.

Recent advances in large language models (LLMs) have shown promise for mental health applications, but their use in real-world suicide risk assessment remains limited by insufficient clinical grounding [1], [2]. In practice, clinicians do not rely solely on surface-level language; they base their judgments on structured psychological evidence, including risk factors such as prior self-harm and access to suicidal means, as well as protective factors such as social support and coping strategies [3]. However, most existing LLM-based approaches focus on end-to-end risk prediction, offering limited interpretability and weak alignment with clinical reasoning [4]. Without a structured evidence layer, model outputs remain difficult to verify, communicate, and integrate into professional workflows [5], [6].

To address this gap, the 2026 challenge formulates suicide risk assessment as a dual-objective task: participants must predict user-level suicide risk while also providing structured clinical evidence, including risk factors, protective factors, and supporting expressions from users’ post histories. This design better reflects clinical practice, where transparent reasoning is essential [7], [8], and promotes human-AI collaboration through systems that are accurate, interpretable, clinically grounded, and actionable [9].

The competition is organized by the team of Prof. Li Qing at The Hong Kong Polytechnic University, with the aim of advancing research in the health and wellness domain. Authors of selected challenge reports will be invited to extend their work for publication in the conference proceedings (after review by the Organizing Committee) and presentation at the conference. Invited teams will be selected based on final rank, the innovativeness of their approach, and the quality of the submitted report.

a) Task Description:

The task in this challenge comprises two subtasks evaluated jointly: Subtask 1 — Suicide Risk Classification. Given the historical posting text of a Reddit user, participants must classify the user into one of four suicide risk levels: indicator, ideation, behavior, or attempt. This follows the same four-level taxonomy used in our 2024 and 2025 challenges, grounded in established clinical frameworks for suicide risk stratification. Subtask 2 — Clinical Evidence Identification. For each predicted risk level, participants must identify and return the supporting evidence underlying the prediction. Evidence takes two complementary forms: (a) psychological factors, including both risk factors (e.g., suicide means, prior self-harm or suicidal thought/attempt, traumatic experience, physical health characteristics) and protective factors (e.g., social support, coping strategies, meaning of life) that are present in the user’s posting history; and (b) suicide-related expressions, defined as verbatim or closely paraphrased spans from the user’s posts that directly or indirectly signal suicidal ideation or behavior.

This combined output is designed to support mental health professionals in making informed, auditable assessments, rather than relying on opaque model predictions alone. The dataset used in this challenge is derived from the Protective Factor-Aware (PFA) dataset [3], which includes 237 Reddit users and 2,515 posts from r/SuicideWatch collected between June 2010 and September 2022.¹ As in prior challenges, the dataset will be provided in a protected format balancing accessibility and privacy. Each user entry is annotated by trained researchers. Inter-annotator agreement was high, with Fleiss’ Kappa of 0.84 for risk level classification and 0.79 for factor annotation, supporting the reliability of the annotation scheme. A portion of the labeled data will be released as the training set. The remaining users will form the test set, for which participants submit both predictions and evidence. Detailed annotation guidelines and factor category specifications will be provided to all registered teams.

¹Dataset details are available at https://github.com/AlexLee01/Embracing-Resilience-DFIL/.

b) Evaluation:

Submissions will be evaluated on both subtasks:

Subtask 1 is evaluated using the macro F1-score over the four risk level categories, ensuring that model performance is assessed equally across all risk levels regardless of class frequency. This is particularly important in suicide risk assessment, where accurate detection of high-risk cases is clinically critical. Subtask 2 is evaluated using macro F1 for psychological factor identification, and token-level partial match F1 for extracted suicide-related expressions, with the two scores averaged into a single Subtask 2 score. The final ranking is based on a composite score S = 0.6×S₁ + 0.4×S₂, where S₁ and S₂ denote the scores for Subtask 1 and Subtask 2 respectively, both normalized to [0, 1] prior to aggregation. Preliminary results will be published on a public leaderboard throughout the challenge period. Only teams that submit both source code and a qualifying challenge report before the submission deadline will be eligible for final evaluation and paper invitation.

Table I
Summary of Previous IEEE BigData Cup Challenges on Suicide Risk Detection

Year	Challenge Theme	Registered Teams	Final / Qualified Teams	Brief Outcome
2024	Suicide Ideation Detection on Social Media	21	13 finalists	9 teams were invited to submit papers and present their work.
2025	User-level Suicide Risk Detection from Longitudinal Histories	36	16 finalists	8 teams were invited to submit extended papers and diliver final presentations in Macau.

II. Short Bio of the Organizers

Qing Li (qing-prof.li@polyu.edu.hk) is currently a Chair Professor (Data Science) and the Head of the Department of Computing at the Hong Kong Polytechnic University, Hong Kong.
Hong Va LEONG (cshleong@comp.polyu.edu.hk) is Senior Lecturer in the Department of Computing at the Hong Kong Polytechnic University, Hong Kong.
Xiangmeng Wang (xiangmengpoly.wang@polyu.edu.hk) is an Assistant Professor (Research) at the Hong Kong Polytechnic University, Hong Kong.
Haoyang Li (haoyang-comp.li@polyu.edu.hk) is currently an Assistant Professor (Research) at the Hong Kong Polytechnic University, Hong Kong.
Jun Li (hialex.li@connect.polyu.edu.hk) is currently a Ph.D. candidate in the Department of Computing at the Hong Kong Polytechnic University, Hong Kong.
Yifei Yan (yfyan8-c@my.cityu.edu.hk) is currently a Ph.D. candidate in the Department of Social and Behavioural Sciences at the City University of Hong Kong, Hong Kong.

III. Tentative Program Committee

Huan Liu, Arizona State University
Jing Li, The Hong Kong Polytechnic University
Manas Gaur, University of South Carolina
Ramit Sawhney, Netaji Subhas Institute of Technology
Hamideh Ghanadian, University of Ottawa
Isar Nejadgholi, National Research Council Canada
Han-Chin Shing, University of Maryland
Nancy Xiaonan Yu, City University of Hong Kong
Guandong Xu, The Education University of Hong Kong
Yu Li, The Chinese University of Hong Kong

References

E. Croxford, Y. Gao, N. Pellegrino, K. Wong, G. Wills, E. First, F. Liao, C. Goswami, B. Patterson, and M. Afshar, “Current and future state of evaluation of large language models for medical summarization tasks,” Npj health systems, vol. 2, no. 1, p. 6, 2025.
H. Wu, M. Wang, J. Wu, F. Francis, Y.-H. Chang, A. Shavick, H. Dong, M. T. Poon, N. Fitzpatrick, A. P. Levine et al., “A survey on clinical natural language processing in the united kingdom from 2007 to 2022,” NPJ digital medicine, vol. 5, no. 1, p. 186, 2022.
J. Li, X. Wang, H. Li, Y. Yan, H. V. Leong, L. Feng, N. X. Yu, and Q. Li, “Protective factor-aware dynamic influence learning for suicide risk prediction on social media,” arXiv preprint arXiv:2507.10008, 2025.
K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl et al., “Large language models encode clinical knowledge,” Nature, vol. 620, no. 7972, pp. 172–180, 2023.
E. Kerz, S. Zanwar, Y. Qiao, and D. Wiechmann, “Toward explainable ai (xai) for mental health detection based on language behavior,” Frontiers in psychiatry, vol. 14, p. 1219479, 2023.
C. Robertson, A. Woods, K. Bergstrand, J. Findley, C. Balser, and M. J. Slepian, “Diverse patients’ attitudes towards artificial intelligence (ai) in diagnosis,” PLOS Digital Health, vol. 2, no. 5, p. e0000237, 2023.
Y. Wu, G. Wan, J. Li, S. Zhao, L. Ma, T. Ye, I. Pop, Y. Zhang, and J. Chen, “Wisemind: Recontextualizing ai with a knowledge-guided, theory-informed multi-agent framework for instrumental and humanistic benefits,” arXiv preprint arXiv:2502.20689, 2025.
E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,” Nature medicine, vol. 25, no. 1, pp. 44–56, 2019.
N. K. Aggarwal, “The cultural formulation interview in case formulations: A state-of-the-science review,” Behavior Therapy, vol. 55, no. 6, pp. 1130–1143, 2024.