Pixel-Precise Segmentation of Solar Filaments
Summary
This proposal describes a data competition to be hosted at IEEE BigData 2026 focused on pixel-precise segmentation of solar filaments using H-Alpha observations. Solar filaments are closely associated with space weather events such as Coronal Mass Ejections, solar flares, and Solar Energetic Particle storms, which can disrupt power grids, GPS systems, aviation, and space operations. Despite their importance, no currently active system provides continuous operational filament detection, leaving a critical gap in space weather monitoring.
This gap can now be addressed using a newly released dataset, namely Manually Annotated GONG Filaments from H-Alpha Observations (MAGFiLO) coupled with observations from H-alpha instruments of Global Oscillations Network Group (GONG1). MAGFiLO, the missing piece of this puzzle, is the largest gold-standard filament dataset released in 2024. MAGFiLO contains 10,244 manually annotated filaments from 1,593 GONG observations, including detailed polygonal masks, spines, bounding boxes, and chirality labels. The dataset was produced through a rigorous multi-annotator pipeline and expert review across a collaboration of three institutions.
Participants will develop segmentation algorithms using any approach, from classical image processing to deep learning, with performance evaluated using IoU, precision, recall, AP@IoU θ, hit rate, miss rate, and a proposed Multi-scale Intersection over Union (MIoU) metric. The competition will be hosted on Kaggle, leveraging custom metric support and providing data, baselines, and evaluation tools.
To increase participation, we may offer research seed grants or support for top teams to present at IEEE BigData 2026, potentially at a proposed SABiD workshop, alongside targeted outreach to the solar physics and machine learning communities. This effort brings together expertise from multiple institutions, namely the National Science Foundation’s (NSF) National Solar Observatory, Georgia State University, the New Jersey Institute of Technology, and the University of Missouri–St. Louis (UMSL). The initiative is led by Dr. Azim Ahmadzadeh (UMSL), whose work focuses on machine learning for space weather applications and is supported by NSF funding.
1GONG: https://en.wikipedia.org/wiki/Global_Oscillations_Network_Group
Short Answers To Listed Questions
In this section, we explicitly address the questions listed in the Big Data Cup webpage. We elaborate on different aspects of our intended plan in the remaining sections.
Answer To Questions
- Do you require source code with submissions?
Yes. We do require source code for reproducibility purposes. - What prizes will be offered?
We consider 2 types of prizes. Our preferred option is cash prize unless it creates unforeseeable challenges (e.g., tax-related issues, permission to use funds for this purpose, issue in receiving cash prize by participants). Our second option is to cover the submission cost of their paper, if the winner decided to do so. This can be done in collaboration with the SABiD workshop which has been part of BigData for the past few years. - What is the dataset to be released?
The dataset, MAGFiLO [1], has already been made publicly accessible at https://www.mlecofi.net/magfilo. GONG H-alpha images are freely available from GONG Data Archive at https://gong2.nso.edu/archive/patch.pl?menutype=s. There are no privacy concerns for either of these two data sets. - Which sector does the proposed Data Challenge belong to?
Academic research at the intersection of Computer Vision, AI, Astrophysics, and Space-Weather Forecast. - Who are the organizers?
Azim Ahmadzadeh, Ph.D., Assistant Professor of Computer Science, University of Missouri–St. Louis.
Dustin J. Kempton, Research Professor of Computer Science, Georgia State University.
Qin Li, Ph.D., Assistant Research Professor of Applied Physics, New Jersey Institute of Technology.
Alexei A. Pevtsov, Ph.D., Full Astronomer at NSF’s National Solar Observatory. - What competition infrastructure will be used?
Submissions will be made to Kaggle. The proposer team has experience in running challenges in Kaggle2 (see https://dmlab.cs.gsu.edu/bigdata/flare-comp-2019/). - What are the task and the evaluation metrics?
The task is segmentation of a specific solar even type, named filaments, which are visible in H-Alpha observations. An example of a solar filament is shown in Fig. 1. In the context of this challenge, segmentation refers to identifying the individual segments of filaments, and associating them to a common structure (e.g., a single filament comprised of individual segments). We provide the users with both (1) the H-Alpha images and (2) the annotation metadata of over 10,000 filaments. For evaluation, we use IoU, Precision, Recall, AP@θ, hit-rate, and miss-rate, as well as MIoU which is a metric designed by the team for fine-grained object segmentation tasks [2].
Science & Significance Of The Proposed Challenge
What Is Space Weather and Who Is Interested in It?
As rain and storms shape terrestrial weather, magnetic waves and winds shape space weather. According to the US National Space Weather Program, “Space weather refers to conditions on the Sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of spaceborne and ground-based technological systems and can endanger human health” [3]. Some of the organizations that are highly interested in space weather research and the operationalization of such efforts include NOAA’s Space Weather Prediction Center (SWPC), NASA Heliophysics, the National Science Foundation (NSF), the U.S. Geological Survey, and the U.S. Air Force Research Laboratory, which coordinate research efforts under the National Space Weather Strategy and Action Plan.
What Are Solar Filaments?
Filaments (see Fig. 1) are dense “clouds” of solar material suspended by magnetic field lines above photospheric neutral lines. The significance of filaments for space weather research lies in the fact that they are at the core of solar eruptions, including Coronal Mass Ejections (CMEs), solar flares, and Solar Energetic Particle (SEP) storms. An Earth-directed CME can cause enormous damage to the electric power grid, disrupt GPS systems, create radiation hazards for passengers and crew on polar flights, and be lethal to astronauts traveling outside the protective bubble provided by the Earth’s magnetosphere. This is why solar filaments are a critical event type in space weather research and forecasting operations.
The Gap in Filament Detection.
Despite the availability of extensive research on filament-detection algorithms, there is currently no publicly available system that continuously detects filaments and reports them to the public. The only operationalized model, developed by Bernasconi et al. [4], is no longer in use. The algorithm, which had been operating since 2008, stopped functioning around 2016–17, with no evidence of a replacement, leaving the community of space-weather researchers and practitioners, as well as the Heliophysics and Astrophysicists, without updated filament information. As a result, the popular web application Helioviewer3, which reports solar events, has not provided filament data for the past decade, except for a small number of manually reported filament events.
3Link to the Helioviewer web app: https://helioviewer.org
Why Can This Gap Be Addressed Now?
Recently, in September 2024, a dataset named Manually Annotated GONG Filaments from H-Alpha Observations (MAGFiLO) was released, constituting the largest gold-standard dataset of solar filaments [1]. All prior work on filament segmentation was validated using either a small set of manually annotated filaments or a set of automatically annotated filaments. MAGFiLO has made it possible to fully leverage complex machine learning models and, perhaps more importantly, has provided ground-truth data for the reliable evaluation of filament-detection algorithms.
What Is the Value of Pixel-Precise Segmentation of Solar Filaments?
Using an automated pipeline to precisely capture the fine structure of solar filaments makes it possible for domain experts to analyze their behavior. The geometry of filaments, as they appear in full-disk H-Alpha observations and magnetograms, can reveal the direction of the axial magnetic field. Martin, in a series of groundbreaking papers in the 1990s reviewed in [5], introduced a method for determining the sign of the axial magnetic field in filaments from H-Alpha observations and magnetograms. Similarly, advanced Computer Vision algorithms (mostly deep neural networks) can be used to identify the so-called chirality of solar filaments. Having access to a large collection of pixel-precise segmentations of solar filaments—and hence their accurate structure—can lead to new discoveries regarding the behavior of solar filaments. This makes the challenge particularly exciting for domain experts.
What Are the Main Challenges In This Task?
While it may seem that today’s advanced object-segmentation algorithms would perform well on this dataset, several challenges remain. Addressing these challenges, not only paves the road towards a more reliable filament-segmentation pipeline, but also addresses gaps in the field of Computer Vision. We list only some of them here. First, it is not easy to capture fine structures such as barbs of solar filaments. Barbs are small thread-like patterns that extend from the main body of filaments along a certain orientation. The orientation of barbs with respect to their spines reveals information about the magnetic field of filaments. Second, it is not trivial to separate filament material (dark regions) from background noise. We rely on observations from ground-based observatories, which produce noisy images. Distinguishing small structures from background noise remains a challenge. Third, algorithms struggle to detect filaments as coherent structures and instead often resort to partial segmentation or segmentation of clusters of nearby islands.
Dataset: Manual Annotations Of Solar Filaments
MAGFiLO is a first-of-its-kind dataset for solar filaments, released in 2024 [1]. The dataset contains 10,244 annotated filaments from 1,593 H-Alpha observations captured by the GONG network. Examples of these annotations are shown in Fig. 2. For each annotated filament, its polygon, spine, minimum bounding box, and chirality are identified. Our annotation pipeline allowed each observation to be carefully annotated by up to three independent annotators and reviewed by domain experts. MAGFiLO only contains the reviewed-and-accepted filaments, most of which were accepted after being sent back several times for improvement.
To the best of our knowledge, no dataset of such size and detail has ever existed. This is due to the complexity of the annotation task. The creation of MAGFiLO took over 1.5 years, involved several Physics and Computer Science experts as well as a few graduate students, and was made possible through the dedicated work of about 40 student annotators who collectively spent over 1,000 person-hours on annotation alone. This product is the result of a close collaboration among three institutions: the University of Missouri–St. Louis, Georgia State University, and the National Solar Observatory. Fig. 3 shows the major steps involved in the manual annotation of filaments.
All details about data collection and curation is published in the Nature, Scientific Data journal [1]. A user-friendly, quick guide is provided on a dedicated webpage at https://www.mlecofi.net/magfilo. The annotations are accessible publicly through Harvard Dataverse platform at https://doi.org/10.7910/DVN/J6JNVK. A demo is also accessible through Bitbucket code repository at https://bitbucket.org/dataresearchlab/mleco-magfilodatacode/src.
Task: Filament Segmentation
We will ask participants to use MAGFiLO to provide pixel-precise segmentation of all filaments. Participants are free to use any algorithms, ranging from traditional image-processing methods to the most recent deep neural network-based approaches, to achieve optimal segmentations. Their final solution may consist of a single algorithm or a combination of algorithms. We will specify a time constraint (i.e., the required image-processing rate in images per second), but depending on the capabilities of the Kaggle platform, we may or may not include it in the ranking criteria.
The competition focuses on the precision of segmentations. To quantitatively capture this, we use IoU, precision, recall, AP@IoUθ, hit rate, and miss rate. We also use Multi-scale Intersection over Union (MIoU), a metric designed by our team for fine-grained object segmentation tasks [2].
Platform, Participation, & Prize
We will use Kaggle to host the competition. The dataset, a demo notebook, and a description of the task, as well as the key challenges to watch for, will be available on a Kaggle competition page.
Since 2023, Kaggle has allowed the use of custom metrics for competitions4. We will use this feature to incorporate MIoU. The Kaggle platform also allows the creation of an evaluation rubric. We will use this feature to clearly communicate with the participants, which submissions receive higher scores.
A script will be provided to users to download all annotated H-Alpha images in JPEG format. This requires a modest storage space of about 800 MB. The annotated data is provided as a JSON file of about 60 MB, which can also be easily handled by average computers.
We consider three main avenues to increase participation. (1) We will include a prize: we are willing to set some cash prize, and/or cover the conference registration fees for the top 2–3 teams (depending on the distribution of scores) to attend the IEEE BigData 2026 conference to present their work, with the accompanying academic paper published in the conference proceedings. There is also the possibility that the finalists will present their work in a yet-to-be-announced workshop on Solar & Stellar Astronomy Big Data (SABiD) as invited speakers to discuss their methods and results. This will be finalized once the workshop is officially accepted to the conference. (2) We will reach out to researchers who have previously worked on the identification of solar filaments and directly invite them to participate, or encourage their students to participate, in this data competition. (3) We will advertise the competition through the organizers’ professional networks of researchers and students. We believe that, with the involvement of multiple institutes, we can achieve a strong level of participation.
Any cost related to the use of Kaggle infrastructure will be covered by the leading organizer, Dr. Ahmadzadeh.
4See: https://www.kaggle.com/docs/competitions-setup#creating-a-new-metric
Bios Of Organizers
Azim Ahmadzadeh, Ph.D.
Affiliation: University of Missouri–St. Louis
Website: https://www.azim-a.com/
Email: ahmadzadeh@umsl.edu
Bio: Dr. Ahmadzadeh is an Assistant Professor of Computer Science at the University of Missouri–St. Louis. His research interests include pixel-precise object detection, time series analysis, and model evaluation. Dr. Ahmadzadeh’s interdisciplinary research uses machine learning and data-driven algorithms to advance space-weather forecasting and preparedness, in collaboration with solar physicists and space-weather experts. His current research is supported by two NSF grants (#2511630, #2433781). Dr. Ahmadzadeh was among the organizing members of two Big Data Cups in 2019 and 2020.
Dustin J. Kempton, Ph.D.
Affiliation: Georgia State University
Email: dkempton1@gsu.edu
Bio: Dr. Kempton is an Assistant Research Professor of Computer Science at Georgia State University. His current research focuses on developing machine learning surrogates for high computational cost physical simulations of the Sun and inner Heliosphere. He also focuses on several areas of management and analysis large datasets for space-weather forecasting, including images and time series observations from multiple space and ground based instruments.
Qin Li, Ph.D.
Affiliation: New Jersey Institute of Technology
Email: qin.li@njit.edu
Bio: Dr. Li is an Assistant Research Professor in Applied Physics at the New Jersey Institute of Technology. His research focuses on small-scale dynamics and energy release in solar activity, as well as long-term solar variability. Dr. Li’s interdisciplinary work integrates physics-informed machine learning with solar and space-weather applications, supported by NASA and NSF grants. He has published and reviewed for leading journals such as The Astrophysical Journal and Solar Physics, and actively involved in projects that bridge solar observations, machine learning, and space-weather forecasting.
Alexei A. Pevtsov, Ph.D.
Affiliation: National Solar Observatory
Email: apevtsov@nso.edu
Bio: Dr. Pevtsov is a Full Astronomer (equivalent to Full Professor with tenure) at the US National Solar Observatory (NSO). He is an NSO’s Associate Director responsible for the NSO’s Integrated Synoptic Program (NISP) and the Project Director of the next-generation Ground-based Solar Observing Network (ngGONG). His research interests include a broad set of questions about the origin and evolution of solar magnetic fields on the Sun, sun-as-a-start, and space weather and space climate.
References
- A. Ahmadzadeh, R. Adhyapak, K. Chaurasiya, L. A. Nagubandi, V. Aparna, P. C. Martens, A. Pevtsov, L. Bertello, A. Pevtsov, N. Douglas et al., “A dataset of manually annotated filaments from h-alpha observations,” Scientific Data, vol. 11, no. 1, p. 1031, 2024. [Online]. Available: https://doi.org/10.1038/s41597-024-03876-y
- A. Ahmadzadeh, D. J. Kempton, Y. Chen, and R. A. Angryk, “Multiscale iou: A metric for evaluation of salient object detection with fine structures,” in 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 684–688.
- R. Schwenn, “Space Weather: The Solar Perspective,” Living Reviews in Solar Physics, vol. 3, no. 1, p. 2, Aug. 2006.
- P. N. Bernasconi, D. M. Rust, and D. Hakim, “Advanced automated solar filament detection and characterization code: Description, performance, and results,” solphys, vol. 228, no. 1-2, pp. 97–117, May 2005.
- S. F. Martin, “Conditions for the Formation and Maintenance of Filaments (Invited Review),” solphys, vol. 182, no. 1, pp. 107–137, Sep. 1998.
