IEEE Big Data Cup 2026

Seven Data Challenges at the Frontier of AI, Science & Society

Presented at the IEEE International Conference on Big Data 2026 · Phoenix, Arizona, USA · December 14–17, 2026

Why Compete

The IEEE Big Data Cup is the conference’s flagship competition program, where teams from around the world tackle real, high-stakes data problems on shared benchmarks and live leaderboards. The 2026 edition brings together seven challenges spanning mental-health AI, space weather, finance, autonomous driving, climate science, urban mobility, and healthcare operations. Every challenge ships a curated dataset, transparent evaluation, and a path to publish and present at IEEE Big Data 2026 in Phoenix. Whether you are entering your first competition or pushing the state of the art, there is a track for you.

What every team gets

Real, citable benchmark datasets — curated by leading academic and industry labs.
Public and private leaderboards — with transparent, reproducible scoring.
Cash prizes, certificates, and registration support — varying by challenge.
A challenge-report slot and the chance to present at the IEEE Big Data 2026 workshop.
Invitations to extend winning work into full papers and conference proceedings.

The Seven Challenges at a Glance

Challenge	Field	Headline dataset	Platform
01 Explainable Suicide Risk Detection	Health · Explainable AI	PFA — 237 Reddit users, 2,515 posts (expert-labeled)	Public leaderboard
02 Pixel-Precise Solar Filament Segmentation	Space weather · Computer vision	MAGFiLO — 10,244 filaments, 1,593 H-Alpha images	Kaggle
03 FinReason Cup — Financial Reasoning & Audit	Finance · LLM agents	FinChain + HERCULEAN benchmarks	Kaggle + Docker
04 UCF UrbanTwin Sim2Real LiDAR	Autonomous driving · Smart city	LUMPI + V2X-Real roadside LiDAR (2 tracks)	Codabench
05 CarbonGlobe Ecosystem Forecasting	AI for Science · Climate	CarbonGlobe — 40 yrs, 812,280 sequences	Kaggle
06 TrafficFlowBench	Transportation · Big data	California — 150 detectors, ~11.3M obs	Codabench / EvalAI
07 χ-Bench — Healthcare Workflows	Healthcare workflows · AI agents	χ-World — ~5,000 chart activities for 50 simulated patients and ~90 healthcare workers	χ-World

The Seven Challenges in Depth

01 Explainable Suicide Risk Detection on Social Media

Can AI assess suicide risk the way a clinician does — and show its evidence? Now in its 3rd edition, this challenge asks models not just to predict, but to justify.

The challenge A dual-objective task scored jointly. Subtask 1 — classify a Reddit user into one of four clinically grounded risk levels (indicator, ideation, behavior, attempt). Subtask 2 — return the supporting clinical evidence: psychological risk factors (e.g., access to means, prior self-harm, trauma) and protective factors (e.g., social support, coping, meaning of life), plus the verbatim post spans that signal risk.

The data Built on the Protective Factor-Aware (PFA) dataset — 237 Reddit users and 2,515 posts from r/SuicideWatch (2010–2022), expert-annotated with high agreement (Fleiss’ κ = 0.84 for risk, 0.79 for factors) and released in a privacy-protected format.

How you win Macro-F1 on each subtask, combined as S = 0.6·S1 + 0.4·S2 on a hidden test set with a live public leaderboard. Source code and a qualifying report are required for finalist eligibility.

Track record Grew from 21 teams (2024) to 36 teams (2025), with finalists invited to present in Macau.

Organizers Prof. Qing Li’s team at The Hong Kong Polytechnic University, with City University of Hong Kong.

02 Pixel-Precise Segmentation of Solar Filaments

Solar filaments trigger the storms that knock out power grids, GPS, and satellites — yet no operational system has tracked them since around 2016. Help reopen the window.

The challenge Produce pixel-precise segmentations of solar filaments in full-disk H-Alpha images — capturing fine structures (barbs), separating faint material from ground-based noise, and detecting each filament as one coherent object. Any method is welcome, from classical image processing to deep neural networks.

The data MAGFiLO — the largest gold-standard filament dataset ever released (Nature Scientific Data, 2024): 10,244 manually annotated filaments across 1,593 GONG observations, with polygon masks, spines, bounding boxes, and chirality labels. Built over 1.5 years by ~40 annotators across three institutions.

How you win IoU, precision, recall, AP@IoU, hit/miss rate, and the organizers’ Multi-scale IoU (MIoU) for fine structures. Hosted on Kaggle with custom-metric scoring; source code required.

Platform & prizes Kaggle. Cash prizes and/or IEEE Big Data 2026 registration for top teams, with invited talks at the proposed SABiD workshop. NSF-supported.

Organizers Azim Ahmadzadeh (Univ. of Missouri–St. Louis, lead), Dustin Kempton (Georgia State), Qin Li (NJIT), and Alexei Pevtsov (NSF National Solar Observatory).

03 FinReason Cup — Agentic Financial Reasoning, Hedging & Audit

Strong models produce fluent financial explanations. FinReason asks the harder question: can their reasoning be executed, audited, and reproduced?

The challenge Three complementary tasks:

Task 1 — Verifiable Chain Reasoning: solve symbolic, multi-step finance problems whose final answer and every intermediate step are automatically checked against an executable trace.
Task 2 — Market-Neutral Hedging: pick an asset pair and manage a dollar-neutral position over time from prices, news, and filings — rewarding relative reasoning over directional bets.
Task 3 — Financial Audit Verification: verify reported values against the XBRL calculation network and US-GAAP taxonomy of real SEC filings, where there is no room for an approximate answer.

The data Built on two organizer benchmarks — FinChain (symbolic, executable reasoning traces) and HERCULEAN (agentic financial workflows). Hidden tests use held-out seeds, market windows, and newly collected filings to prevent memorization.

How you win Task 1 — answer accuracy plus step-level ChainEval; Task 2 — Sharpe Ratio (with cumulative return and maximum drawdown); Task 3 — accuracy and structural / extraction / calculation error rates. Awards for best per task, best reproducible open-source system, and best student team.

Platform Kaggle, with an organizer-maintained Docker evaluation server for the agentic tasks.

Organizers The Fin AI with MBZUAI, McGill, Stevens, Yale, and the University of Manchester — including Preslav Nakov and IEEE Fellow Steve Liu.

04 UCF UrbanTwin Sim2Real LiDAR Challenge

Roadside-LiDAR perception is bottlenecked by the cost of labeled data. Can synthetic LiDAR be made realistic enough to train detectors that work on the real thing?

The challenge Train a 3D object detector on synthetic LiDAR only — no real labels — then run it on held-out real frames. Generate your synthetic data any way you like: physics simulation (CARLA), diffusion / flow models, neural radiance fields, or hybrids. Submit 50 synthetic frames, your detections on 50 real frames, and a signed honor declaration.

Two parallel tracks

LUMPI (Hannover, Germany): multi-perspective roadside LiDAR, 8 classes including vulnerable road users.
V2X-Real (Los Angeles, USA): infrastructure-centric V2X LiDAR, 3 classes, very different sensor geometry. Winning both proves cross-distribution robustness.

How you win A combined score of 0.6·detection + 0.4·realism — detection via KITTI 3D mAP, realism via Chamfer Distance, MMD, Earth Mover’s Distance, and Fréchet Point-cloud Distance. The exact scorer ships, so you can reproduce server scores locally.

Platform & prizes Codabench (already provisioned). A USD 2,000 pool, with per-track awards of $500 / $300 / $200; top-3 teams release code and present a challenge report.

Head start A reference synthesis pipeline (UrbanTwin), ready-made synthetic datasets (Harvard Dataverse), and the open-source LiGuard toolkit are all provided.

Organizers Muhammad Shahbaz and Shaurya Agarwal, Urbanity Lab, University of Central Florida.

05 CarbonGlobe — Global-Scale Land Ecosystem Forecasting

Emulate four decades of the planet’s carbon cycle — the first open competition for global, multi-decadal land-ecosystem forecasting.

The challenge Given initial ecosystem states and long-term environmental forcings, forecast seven annual carbon variables — vegetation height, aboveground biomass, soil carbon, leaf area index, and gross / net primary productivity and heterotrophic respiration — over multi-decadal horizons. Build fast ML emulators of a physics-based ecosystem model.

The data CarbonGlobe (NeurIPS 2025) — 40 years of global data at 0.5° resolution, 136 input variables, and calibrated Ecosystem Demography (ED) model outputs across 54,152 land sites and 15 forest-age conditions: 812,280 forecasting sequences. ED underpins NASA’s Carbon Monitoring System and the Global Carbon Budget.

How you win RMSE and MAE plus problem-driven metrics — cumulative error and year-to-year delta error — on a hidden test set of future ecosystem states. Source code required for reproducibility.

Platform Kaggle, with baselines and starter notebooks on GitHub.

Organizers Yiqun Xie, Zhihao Wang, Lei Ma, and George Hurtt (University of Maryland), Xiaowei Jia (Rutgers), and Yanhua Li (Worcester Polytechnic Institute).

06 TrafficFlowBench — Traffic-State, OD-Demand & Congestion Analytics

Reconstruct, predict, and explain a metropolis’s traffic — and prove the answer is physically real, not just accurate.

Two divisions

Open Division — no transportation background needed: reconstruct and predict speed and flow at held-out detectors and times, scored on plain RMSE/MAE. Enter in an afternoon with the starter kit; large student & newcomer prizes.
Expert Division — the full physical benchmark: fundamental-diagram recovery, OD-demand estimation, congestion diagnostics, physical-consistency scoring, and scalability.

Three tracks

TrafficStateBench — estimation (spatial imputation), prediction (forecast), and fundamental-diagram recovery.
ODMEBench — time-dependent OD-demand estimation, plus the flagship demand-to-congestion attribution: trace which trips load which bottleneck.
ShockwaveBench — congestion onset/duration, queue localization, and shockwave tracking, at three difficulty levels.

The data An open, reproducible California benchmark — an OSM2GMNS network with 150 PeMS-style loop detectors across five Los Angeles freeways (I-5, I-10, I-110, I-210, I-405) at 5-minute resolution for all of 2025 (~11.3 million observations), plus GPS traces, POI trip generation, and NGSIM trajectories. A second Shenzhen region (SUTPC) is prospective.

How you win Per-track composite scores blending accuracy, physical consistency, demand attribution, and congestion diagnostics — grounded in the fundamental diagram, kinematic-wave theory, and the Rankine-Hugoniot condition. Reproducibility is a first-class, machine-checked scoring term. GNNs, spatiotemporal foundation models, PINNs, RL, and LLM agents are all welcome.

Platform & prizes Codabench / EvalAI (evergreen leaderboard), cross-posted to Kaggle. Gold $1,500 · Silver $1,000 · Bronze $500 · Student/Newcomer $500, per division and track — backed by a confirmed Gold sponsorship.

Key dates Registration opens July 15, 2026 · Final submission November 6, 2026 · Results workshop at IEEE Big Data 2026, Phoenix, December 14–17, 2026.

Organizers Shenzhen Urban Transport Planning Center (SUTPC) and the IEEE ITS Society TC on Travel Information & Traffic Management, with RERITE — co-chaired by Xuesong (Simon) Zhou and Xiaochun Zhang, with Cathy Wu (MIT) and Yudai Honma (Univ. of Tokyo).

07 χ-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Can an AI agent navigate the administrative maze of U.S. healthcare — prior authorization, utilization management, care management — the way a trained healthcare worker does, and do it reliably enough to trust? This challenge asks agents not just to answer, but to act, justify, and complete.

The challenge Three parallel tracks, each a long-horizon agentic task inside χ-World — a high-fidelity simulator of 20 real-world healthcare apps operable via 151 REST APIs and 87 MCP tools.

Track 1 (PA) — verify coverage, gather clinical evidence, submit a prior authorization packet, and work the response through RFIs, peer-to-peer review, and appeals to a terminal status.
Track 2 (UM) — intake the request, check plan medical policy, escalate through nurse and physician reviewers, and issue a coverage determination.
Track 3 (CM) — review the patient chart, conduct multi-turn outreach, administer assessments, and author a NANDA-I / NOC / NIC care plan.

Every decision must be grounded in a 1,279-document Managed-Care Operations Handbook developed with clinicians at Johns Hopkins Medicine.

The data χ-World — a containerized simulator populated with ~5,000 chart activities for 50 simulated patients and ~90 healthcare workers (~115K lines of Python), with a 29-status case state machine, FHIR-grade encounter linkage, and a held-out private test set of ~75 clinician-validated tasks never released during the competition. Contains no real patient data, no PHI, and requires no IRB clearance.

How you win A two-layer verifier scores each trial: a deterministic contract checks terminal status, routing assignments, structured payloads, and required artifacts; a rubric-based LLM judge (pinned model, three independent votes) grades whether clinical reasoning is grounded in cited policy sections. Primary metric is pass@1 (fraction of tasks solved on a single attempt) across all three tracks, combined into an overall leaderboard score. Ties broken by pass^3 (solved in all 3 independent attempts), then by efficiency (cost and tool-call steps). Source code and a qualifying report are required for finalist eligibility; finalists are re-verified by organizers on the private test set.

Cash prizes Gold (1st overall) $1,500 · Silver (2nd overall) $1,000 · Bronze (3rd overall) $500 · plus a dedicated Reliability Award for the top pass^3 score. Total prize pool: $3,000, sponsored by actAVA.ai. Finalists are additionally offered co-authorship on a post-competition community results report and invited to present at IEEE Big Data 2026 in Phoenix.

Track record The benchmark is far from solved — the best frontier agent resolves only 28.0% of tasks overall (pass@1), no configuration cleared 20% under the strict pass^3 reliability metric, full-session multi-task performance collapses to 3.8%, and the end-to-end provider–payer arena drops the best agents to 0%.

Organizers Weiran Yao (actAVA.ai, lead), Frank Wang (CTO, actAVA.ai), Haolin Chen (actAVA.ai). Clinical advisors: T. Y. Alvin Liu MD (Johns Hopkins Medicine), Hank Capps MD (Wellstar Health System).

Scientific advisors: Philip S. Yu (University of Illinois Chicago), Eric P. Xing (MBZUAI & Carnegie Mellon University), Kun Zhang (CMU & MBZUAI), Sanmi Koyejo (Stanford University), Caiming Xiong (Recursive Superintelligence), Biwei Huang (UC San Diego), Yue Zhao (University of Southern California), Carl Yang (Emory University), Qingsong Wen (independent), Hua Wei (Arizona State University), and Yanjie Fu (Arizona State University).

Ready to Compete?

Four steps to the leaderboard:

Choose the challenge that fits your skills — from an afternoon-friendly Open Division to a deep research benchmark.
Register on the listed platform (Kaggle, Codabench, or EvalAI) and download the starter kit, dataset, and baselines.
Build, submit, and climb the public leaderboard; final standings are decided on a hidden test set.
Submit your challenge report and present your work at IEEE Big Data 2026 in Phoenix this December.

Data releases for the 2026 cup begin June 1, 2026. Watch the official IEEE Big Data Cup page for each challenge’s registration link and final rules.

Summarized from the official IEEE Big Data Cup 2026 competition proposals. Datasets, prizes, platforms, and dates are as described by each organizing team and may be subject to final confirmation on the official challenge pages.