Can AI assess suicide risk the way a clinician does — and show its evidence? Now in its 3rd edition, this challenge asks models not just to predict, but to justify.
The challenge A dual-objective task scored jointly. Subtask 1 — classify a Reddit user into one of four clinically grounded risk levels (indicator, ideation, behavior, attempt). Subtask 2 — return the supporting clinical evidence: psychological risk factors (e.g., access to means, prior self-harm, trauma) and protective factors (e.g., social support, coping, meaning of life), plus the verbatim post spans that signal risk.
The data Built on the Protective Factor-Aware (PFA) dataset — 237 Reddit users and 2,515 posts from r/SuicideWatch (2010–2022), expert-annotated with high agreement (Fleiss’ κ = 0.84 for risk, 0.79 for factors) and released in a privacy-protected format.
How you win Macro-F1 on each subtask, combined as S = 0.6·S1 + 0.4·S2 on a hidden test set with a live public leaderboard. Source code and a qualifying report are required for finalist eligibility.
Track record Grew from 21 teams (2024) to 36 teams (2025), with finalists invited to present in Macau.
Organizers Prof. Qing Li’s team at The Hong Kong Polytechnic University, with City University of Hong Kong.
02 Pixel-Precise Segmentation of Solar Filaments
Solar filaments trigger the storms that knock out power grids, GPS, and satellites — yet no operational system has tracked them since around 2016. Help reopen the window.
The challenge Produce pixel-precise segmentations of solar filaments in full-disk H-Alpha images — capturing fine structures (barbs), separating faint material from ground-based noise, and detecting each filament as one coherent object. Any method is welcome, from classical image processing to deep neural networks.
The data MAGFiLO — the largest gold-standard filament dataset ever released (Nature Scientific Data, 2024): 10,244 manually annotated filaments across 1,593 GONG observations, with polygon masks, spines, bounding boxes, and chirality labels. Built over 1.5 years by ~40 annotators across three institutions.
How you win IoU, precision, recall, AP@IoU, hit/miss rate, and the organizers’ Multi-scale IoU (MIoU) for fine structures. Hosted on Kaggle with custom-metric scoring; source code required.
Platform & prizes Kaggle. Cash prizes and/or IEEE Big Data 2026 registration for top teams, with invited talks at the proposed SABiD workshop. NSF-supported.
Organizers Azim Ahmadzadeh (Univ. of Missouri–St. Louis, lead), Dustin Kempton (Georgia State), Qin Li (NJIT), and Alexei Pevtsov (NSF National Solar Observatory).
03 FinReason Cup — Agentic Financial Reasoning, Hedging & Audit
Strong models produce fluent financial explanations. FinReason asks the harder question: can their reasoning be executed, audited, and reproduced?
The challenge Three complementary tasks:
- Task 1 — Verifiable Chain Reasoning: solve symbolic, multi-step finance problems whose final answer and every intermediate step are automatically checked against an executable trace.
- Task 2 — Market-Neutral Hedging: pick an asset pair and manage a dollar-neutral position over time from prices, news, and filings — rewarding relative reasoning over directional bets.
- Task 3 — Financial Audit Verification: verify reported values against the XBRL calculation network and US-GAAP taxonomy of real SEC filings, where there is no room for an approximate answer.
The data Built on two organizer benchmarks — FinChain (symbolic, executable reasoning traces) and HERCULEAN (agentic financial workflows). Hidden tests use held-out seeds, market windows, and newly collected filings to prevent memorization.
How you win Task 1 — answer accuracy plus step-level ChainEval; Task 2 — Sharpe Ratio (with cumulative return and maximum drawdown); Task 3 — accuracy and structural / extraction / calculation error rates. Awards for best per task, best reproducible open-source system, and best student team.
Platform Kaggle, with an organizer-maintained Docker evaluation server for the agentic tasks.
Organizers The Fin AI with MBZUAI, McGill, Stevens, Yale, and the University of Manchester — including Preslav Nakov and IEEE Fellow Steve Liu.
04 UCF UrbanTwin Sim2Real LiDAR Challenge
Roadside-LiDAR perception is bottlenecked by the cost of labeled data. Can synthetic LiDAR be made realistic enough to train detectors that work on the real thing?
The challenge Train a 3D object detector on synthetic LiDAR only — no real labels — then run it on held-out real frames. Generate your synthetic data any way you like: physics simulation (CARLA), diffusion / flow models, neural radiance fields, or hybrids. Submit 50 synthetic frames, your detections on 50 real frames, and a signed honor declaration.
Two parallel tracks
- LUMPI (Hannover, Germany): multi-perspective roadside LiDAR, 8 classes including vulnerable road users.
- V2X-Real (Los Angeles, USA): infrastructure-centric V2X LiDAR, 3 classes, very different sensor geometry. Winning both proves cross-distribution robustness.
How you win A combined score of 0.6·detection + 0.4·realism — detection via KITTI 3D mAP, realism via Chamfer Distance, MMD, Earth Mover’s Distance, and Fréchet Point-cloud Distance. The exact scorer ships, so you can reproduce server scores locally.
Platform & prizes Codabench (already provisioned). A USD 2,000 pool, with per-track awards of $500 / $300 / $200; top-3 teams release code and present a challenge report.
Head start A reference synthesis pipeline (UrbanTwin), ready-made synthetic datasets (Harvard Dataverse), and the open-source LiGuard toolkit are all provided.
Organizers Muhammad Shahbaz and Shaurya Agarwal, Urbanity Lab, University of Central Florida.
05 CarbonGlobe — Global-Scale Land Ecosystem Forecasting
Emulate four decades of the planet’s carbon cycle — the first open competition for global, multi-decadal land-ecosystem forecasting.
The challenge Given initial ecosystem states and long-term environmental forcings, forecast seven annual carbon variables — vegetation height, aboveground biomass, soil carbon, leaf area index, and gross / net primary productivity and heterotrophic respiration — over multi-decadal horizons. Build fast ML emulators of a physics-based ecosystem model.
The data CarbonGlobe (NeurIPS 2025) — 40 years of global data at 0.5° resolution, 136 input variables, and calibrated Ecosystem Demography (ED) model outputs across 54,152 land sites and 15 forest-age conditions: 812,280 forecasting sequences. ED underpins NASA’s Carbon Monitoring System and the Global Carbon Budget.
How you win RMSE and MAE plus problem-driven metrics — cumulative error and year-to-year delta error — on a hidden test set of future ecosystem states. Source code required for reproducibility.
Platform Kaggle, with baselines and starter notebooks on GitHub.
Organizers Yiqun Xie, Zhihao Wang, Lei Ma, and George Hurtt (University of Maryland), Xiaowei Jia (Rutgers), and Yanhua Li (Worcester Polytechnic Institute).
06 TrafficFlowBench — Traffic-State, OD-Demand & Congestion Analytics
Reconstruct, predict, and explain a metropolis’s traffic — and prove the answer is physically real, not just accurate.
Two divisions
- Open Division — no transportation background needed: reconstruct and predict speed and flow at held-out detectors and times, scored on plain RMSE/MAE. Enter in an afternoon with the starter kit; large student & newcomer prizes.
- Expert Division — the full physical benchmark: fundamental-diagram recovery, OD-demand estimation, congestion diagnostics, physical-consistency scoring, and scalability.
Three tracks
- TrafficStateBench — estimation (spatial imputation), prediction (forecast), and fundamental-diagram recovery.
- ODMEBench — time-dependent OD-demand estimation, plus the flagship demand-to-congestion attribution: trace which trips load which bottleneck.
- ShockwaveBench — congestion onset/duration, queue localization, and shockwave tracking, at three difficulty levels.
The data An open, reproducible California benchmark — an OSM2GMNS network with 150 PeMS-style loop detectors across five Los Angeles freeways (I-5, I-10, I-110, I-210, I-405) at 5-minute resolution for all of 2025 (~11.3 million observations), plus GPS traces, POI trip generation, and NGSIM trajectories. A second Shenzhen region (SUTPC) is prospective.
How you win Per-track composite scores blending accuracy, physical consistency, demand attribution, and congestion diagnostics — grounded in the fundamental diagram, kinematic-wave theory, and the Rankine-Hugoniot condition. Reproducibility is a first-class, machine-checked scoring term. GNNs, spatiotemporal foundation models, PINNs, RL, and LLM agents are all welcome.
Platform & prizes Codabench / EvalAI (evergreen leaderboard), cross-posted to Kaggle. Gold $1,500 · Silver $1,000 · Bronze $500 · Student/Newcomer $500, per division and track — backed by a confirmed Gold sponsorship.
Key dates Registration opens July 15, 2026 · Final submission November 6, 2026 · Results workshop at IEEE Big Data 2026, Phoenix, December 14–17, 2026.
Organizers Shenzhen Urban Transport Planning Center (SUTPC) and the IEEE ITS Society TC on Travel Information & Traffic Management, with RERITE — co-chaired by Xuesong (Simon) Zhou and Xiaochun Zhang, with Cathy Wu (MIT) and Yudai Honma (Univ. of Tokyo).