AI Emulation Challenge: Global-Scale Land Ecosystem Forecasting

Organizers: Yiqun Xie1, Zhihao Wang1, Lei Ma1, George Hurtt1, Xiaowei Jia2, Yanhua Li3

1University of Maryland, 2Rutgers University, 3Worcester Polytechnic Institute

{xie, zhwang1, lma6, gchurtt}@umd.edu, xj159@cs.rutgers.edu, yli15@wpi.edu

Dataset

We propose to organize a data challenge based on our recent large-scale benchmark dataset at NeurIPS 2025: CarbonGlobe, which is a global-scale, multi-decade, ML-ready dataset for land ecosystem forecasting [1]. CarbonGlobe contains 40 years of global data at 0.5° spatial resolution, integrating 136 meteorological, CO2, soil, and environmental input variables with calibrated Ecosystem Demography (ED) model outputs. The ED model is a pioneering model based on ecological and carbon cycle theories, and it has been continually developed over the past decades to improve terrestrial carbon dynamic modeling [2]-[4]. Due to its high quality, ED has been used to support NASA's Carbon Monitoring System, included in the official Global Carbon Budget [5]-[7], and operationally adopted by the State of Maryland, US, for annual forest carbon inventory updates [8]-[10].

The prediction targets include seven ecosystem carbon variables: vegetation height, aboveground biomass, soil carbon, leaf area index, gross primary productivity, net primary productivity, and heterotrophic respiration.

The dataset includes 54,152 globally distributed land sites and 15 initial forest-age conditions, resulting in 812,280 forecasting sequences. The dataset has no privacy or human-subject concerns and will be publicly released through Kaggle, with code and baseline models released through GitHub.

Challenge Sector

AI for Science; AI emulation; Time-series forecasting.

Task and Evaluation Metrics

Participants will develop machine learning models to emulate long-term ecosystem dynamics. Given initial ecosystem states and multi-source, long-term environmental forcing inputs, which represent the time-series drivers necessary for forecasting the output targets, models will forecast the corresponding annual carbon-related variables over multi-decadal horizons.

Submissions will be evaluated on hidden test data representing future ecosystem states, using standard forecasting metrics, including RMSE and MAE, and problem-driven ecosystem forecasting metrics, including cumulative error and year-to-year delta error.

Source code is required for all submissions to support reproducibility and verification.

Competition Infrastructure

The competition will be hosted on Kaggle. Kaggle will provide the public leaderboard, private leaderboard, hidden test evaluation, submission management, and participant discussion forum. We will provide starter notebooks, data documentation, and reproducible evaluation metrics.

Prizes

If external sponsorship becomes available, cash prizes may be added following IEEE Big Data Cup policies. If unavailable, we plan to offer small cash prizes or non-cash recognition awards for the top-performing teams.

Organizers

  • Yiqun Xie, University of Maryland
  • Zhihao Wang, University of Maryland
  • Lei Ma, University of Maryland
  • George Hurtt, University of Maryland
  • Xiaowei Jia, Rutgers University
  • Yanhua Li, Worcester Polytechnic Institute

Expected Impact

The proposed challenge will provide the first open competition for machine learning emulation of land ecosystem at the global scale over multiple decades. It will bring together the IEEE Big Data community, Earth scientists, and machine learning researchers to develop scalable emulators for large-scale ecosystem forecasting, enabling new capabilities for science question answering.

For example, participants' models could help identify regions with elevated risks of ecosystem transition at finer spatial scale, or explore long-term carbon sequestration potential under alternative environmental trajectories.

By providing a standardized dataset, hidden test benchmark, and problem-specific evaluation metrics, the challenge will accelerate reproducible research on AI for Earth Science and support broader adoption of machine learning in high-impact Earth science applications.

References

  1. [1] Z. Wang, L. Ma, G. Hurtt, X. Jia, Y. Li, R. Li, Z. Li, S. Xu, and Y. Xie, “CarbonGlobe: A global-scale, multi-decade dataset and benchmark for carbon forecasting in forest ecosystems,” in Advances in Neural Information Processing Systems, D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, Eds., vol. 38. Curran Associates, Inc., 2025.
  2. [2] G. C. Hurtt, P. R. Moorcroft, S. W. P. And, and S. A. Levin, “Terrestrial models and global change: challenges for the future,” Global Change Biology, vol. 4, no. 5, pp. 581-590, 1998.
  3. [3] R. A. Fisher, C. D. Koven, W. R. Anderegg, B. O. Christoffersen, M. C. Dietze, C. E. Farrior, J. A. Holm, G. C. Hurtt, R. G. Knox, P. J. Lawrence et al., “Vegetation demographics in earth system models: A review of progress and priorities,” Global Change Biology, vol. 24, no. 1, pp. 35-54, 2018.
  4. [4] L. Ma, G. Hurtt, L. Ott, R. Sahajpal, J. Fisk, R. Lamb, H. Tang, S. Flanagan, L. Chini, A. Chatterjee et al., “Global evaluation of the ecosystem demography model (ed v3.0),” Geoscientific Model Development, vol. 15, no. 5, pp. 1971-1994, 2022.
  5. [5] P. Friedlingstein, M. O'Sullivan, M. W. Jones, R. M. Andrew, D. C. Bakker, J. Hauck, P. Landschutzer, C. Le Quere, I. T. Luijkx, G. P. Peters et al., “Global carbon budget 2023,” Earth System Science Data, vol. 15, no. 12, pp. 5301-5369, 2023.
  6. [6] P. Friedlingstein, M. O'Sullivan, M. W. Jones, R. M. Andrew, J. Hauck, P. Landschutzer, C. Le Quere, H. Li, I. T. Luijkx, A. Olsen et al., “Global carbon budget 2024,” Earth System Science Data Discussions, vol. 2024, pp. 1-133, 2024.
  7. [7] P. Friedlingstein, M. O'Sullivan, M. W. Jones, R. M. Andrew, D. C. Bakker, J. Hauck, P. Landschutzer, C. Le Quere, H. Li, I. T. Luijkx et al., “Global carbon budget 2025,” Earth System Science Data Discussions, vol. 2025, pp. 1-139, 2025.
  8. [8] “Maryland Tree and Forest Carbon Flux: Data and Methodology Documentation,” Maryland Department of the Environment and Maryland Department of Natural Resources, Tech. Rep., 2023, accessed: 2024-12-19. [Online]. Available: https://mde.maryland.gov/programs/air/ClimateChange/Documents/VIMAL/MD_ForestCarbon_Flux_Methodology_01.06.23.pdf
  9. [9] “Reducing Greenhouse Gas Emissions in Maryland: A Progress Report,” Maryland Department of the Environment, Tech. Rep., 2022, accessed: 2024-12-19. [Online]. Available: https://mde.maryland.gov/programs/air/ClimateChange/Documents/GGRA%20PROGRESSS%20REPORT%202022.pdf
  10. [10] G. Hurtt, C. Silva, R. Lamb, L. Ma, and Q. Shen, “High-Resolution Annual Forest Carbon Monitoring Utilizing Remote Sensing,” Maryland Department of the Environment, Tech. Rep., 2021, accessed: 2024-12-19. [Online]. Available: https://mde.maryland.gov/programs/Air/ClimateChange/MCCC/MWG/Annual%20Forest%20Carbon%20Monitoring%20presentation%20by%20George%20Hurtt_UMD.pdf