Publications
2025
- DissertationSeeking Structure in Complex Systems: From Feature Analysis to Space-Time Causal Discovery With Earth Science ApplicationsJ. Jake NicholJul 2025Advisor: Melanie E. Moses. Committee: G. Matthew Fricke, Abdullah Mueen, Tobias P. Fischer, Laura P. Swiler
Complex systems are difficult to study because of their many interacting parts, emergent phenomena, and feedback loops. These systems underpin all life on Earth. We need improved tools for seeking an understanding of them. This body of research presents my investigations into data-driven methods for understanding complex systems, including my invention of a novel causal discovery meta-algorithm for space-time gridded data. I demonstrated machine learning feature importance and causal discovery capabilities for comparing simulated and observed climate data. I developed a new benchmark for modeling space-time dynamics of locally driven phenomena and examined a prominent causal discovery algorithm. Finding that contemporary causal discovery struggles with the high-dimensionality of space-time gridded data, I developed CaStLe, a causal discovery meta-algorithm for recovering the space-time evolution of advective phenomena. Finally, I extended CaStLeto recover multivariate space-time dynamics. This research enhances scientists’ capabilities to explore and understand complex systems in our universe.
- JGR:MLCSpace-Time Causal Discovery in Earth System Science: A Local Stencil Learning ApproachJ. Jake Nichol, Michael Weylandt, G. Matthew Fricke , and 3 more authorsJournal of Geophysical Research: Machine Learning and Computation, Jul 2025
Causal discovery tools enable scientists to infer meaningful relationships from observational data, spurring advances in fields as diverse as biology, economics, and climate science. Despite these successes, the application of causal discovery to space-time systems remains immensely challenging due to the high-dimensional nature of the data. For example, in climate sciences, modern observational temperature records over the past few decades regularly measure thousands of locations around the globe. To address these challenges, we introduce Causal Space-Time Stencil Learning (CaStLe), a novel meta-algorithm for discovering causal structures in complex space-time systems. CaStLe leverages regularities in local space-time dependencies to learn governing global dynamics. This local perspective eliminates spurious confounding and drastically reduces sample complexity, making space-time causal discovery practical and effective. For causal discovery, CaStLe flexibly accepts any appropriately adapted time series causal discovery algorithm to recover local causal structures. These advances enable causal discovery of geophysical phenomena that were previously unapproachable, including non-periodic, transient phenomena such as volcanic eruption plumes. Regularities in local space-time dependencies are transformed into informative spatial replicates, which actually improve CaStLe’s performance when applied to ever-larger spatial grids. We successfully apply CaStLe to discover the atmospheric dynamics governing the climate response to the 1991 Mount Pinatubo volcanic eruption. We provide validation experiments to demonstrate the effectiveness of CaStLe over existing causal-discovery frameworks on a range of geophysics-inspired benchmarks while identifying the method’s limitations and domains where its assumptions may not hold. We introduce a new method for learning the dynamics of causal systems, that is, the physical rules that define a system’s behavior. Although this task, causal discovery, is not new, existing tools are ill-suited for many large geophysics data sets. Current state-of-the-art approaches use statistical techniques to search for causal relationships between all aspects of a system, examining billions of possible causal effects, or simplifying the data by focusing on the most important variables. Instead of an exhaustive search or oversimplifying the data, we incorporate basic physical principles—requiring effects to be “local” and “uniform”—to massively simplify the causal discovery problem. We demonstrate that our approach can recover known geophysical dynamics by applying it to the 1991 Mt. Pinatubo eruption, validating its ability to uncover space-time causal structure from observational data. We introduce Causal Space-Time Stencil Learning (CaStLe) for learning local causal dynamical structure underlying space-time data CaStLe enables previously infeasible analyses of grid-cell-level Earth system data, significantly outperforming traditional methods We demonstrate this new capability by recovering the space-time evolution of atmospheric aerosol flow weeks post-volcanic eruption We introduce Causal Space-Time Stencil Learning (CaStLe) for learning local causal dynamical structure underlying space-time data CaStLe enables previously infeasible analyses of grid-cell-level Earth system data, significantly outperforming traditional methods We demonstrate this new capability by recovering the space-time evolution of atmospheric aerosol flow weeks post-volcanic eruption
2024
- SNL ReportCLimate Impact: Determining Etiology thRough pAthways (CLDERA)Diana Bull, Kara Peterson, Lyndsay Shand , and 44 more authorsSep 2024
Climate impacts have broad economic, health, political, and national security ramifications. Societally relevant impacts are typically farther downstream, are the product of multiple interacting processes, and can arise over small regions and timeframes because their sources are short-term and localized. Short-term forcings (as can be seen in volcanic eruptions, climatic tipping points (e.g., the collapse of rainforests or the disappearance of sea ice), or in increasingly plausible climate interventions) fundamentally possess low signal-to-noise and could benefit from accounting for the multiple conditional processes through which a downstream impact arises. Under the Grand Challenge LDRD CLDERA (CLimate impacts: Discovering Etiology thRough pAthways), we have developed tools to enable downstream impact attribution from geographically and temporally localized source forcings in the climate. CLDERA developed methods that can distinguish how a localized source drives the climate system to respond with particular impacts. The how is embodied in pathways – the spatio-temporally evolving chain of physical processes that connects a source to a series of increasingly distant impacts. Novel analytic methods in pursuit of downstream impact attribution were developed and demonstrated on simulations and observations of the 1991 eruption of Mt. Pinatubo in the Philippines. As described within this report we have • developed stratospheric expertise and aerosol modeling capabilities in E3SM, • created original methods to detect and model pathways from source-to-impact, and • advanced climate attribution through novel methods, cases, and approaches. Further, CLDERA developed a tiered verification process consisting of controlled datasets to prototype, verify, and refine the original method development. CLDERA increased Sandia’s footprint in the climate analytics community and developed new climate collaborations whilst also creating a cadre of climate analysts at Sandia. The products from CLDERA have been extensive with a total of 9 journal articles published, 12 articles submitted and under review, and an additional 8 articles in preparation. We have produced 1750 simulated years and developed 9 code-bases. This report details these accomplishments and serves as a summary of the work completed during the CLDERA Grand Challenge.
2023
- SNL ReportBenchmarking the PCMCI Causal Discovery Algorithm for Spatiotemporal SystemsJ. Jake Nichol, Michael Weylandt, Mark Smith , and 1 more authorSep 2023
Causal discovery algorithms construct hypothesized causal graphs that depict causal dependencies among variables in observational data. While powerful, the accuracy of these algorithms is highly sensitive to the underlying dynamics of the system in ways that have not been fully characterized in the literature. In this report, we benchmark the PCMCI causal discovery algorithm in its application to gridded spatiotemporal systems. Effectively computing grid-level causal graphs on large grids will enable analysis of the causal impacts of transient and mobile spatial phenomena in large systems, such as the Earth’s climate. We evaluate the performance of PCMCI with a set of structural causal models, using simulated spatial vector autoregressive processes in one-and two-dimensions. We develop computational and analytical tools for characterizing these processes and their associated causal graphs. Our findings suggest that direct application of PCMCI is not suitable for the analysis of dynamical spatiotemporal gridded systems, such as climatological data, without significant preprocessing and downscaling of the data. PCMCI requires unrealistic sample sizes to achieve acceptable performance on even modestly sized problems and suffers from a notable curse of dimensionality. This work suggests that, even under generous structural assumptions, significant additional algorithmic improvements are needed before causal discovery algorithms can be reliably applied to grid-level outputs of earth system models.
2021
- JCAMMachine learning feature analysis illuminates disparity between E3SM climate models and observed climate changeJ. Jake Nichol, Matthew G. Peterson, Kara J. Peterson , and 2 more authorsJournal of Computational and Applied Mathematics, Oct 2021
In September of 2020, Arctic sea ice extent was the second-lowest on record. State of the art climate prediction uses Earth system models (ESMs), driven by systems of differential equations representing the laws of physics. Previously, these models have tended to underestimate Arctic sea ice loss. The issue is grave because accurate modeling is critical for economic, ecological, and geopolitical planning. We use machine learning techniques, including random forest regression and Gini importance, to show that the Energy Exascale Earth System Model (E3SM) relies too heavily on just one of the ten chosen climatological quantities to predict September sea ice averages. Furthermore, E3SM gives too much importance to six of those quantities when compared to observed data. Identifying the features that climate models incorrectly rely on should allow climatologists to improve prediction accuracy.
- SNL ReportCausal Evaluations for Identifying Differences between Observations and Earth System ModelsJ. Jake Nichol, Matthew Peterson, and Kara PetersonOct 2021
- ICMLLearning Why: Data-Driven Causal Evaluations of Climate Models.J. Jake Nichol, Matthew Peterson, G. Matthew Fricke , and 1 more authorICML 2021 Workshop Tackling Climate Change with Machine Learning, Oct 2021
We plan to use nascent data-driven causal discovery methods to find and compare causal relationships in observed data and climate model output. We will look at ten different features in the Arctic climate collected from public databases and from the Energy Exascale Earth System Model (E3SM). In identifying and comparing the resulting causal networks, we hope to find important differences between observed causal relationships and those in climate models. With these, climate modeling experts will be able to improve the coupling and parameterization of E3SM and other climate models.
2020
- SNL ReportArctic Tipping Points Triggering Global Change (LDRD Final Report)Kara J. Peterson, Amy Jo Powell, Irina Kalashnikova Tezaur , and 7 more authorsSep 2020
2018
- arXivThe Swarmathon: An Autonomous Swarm Robotics CompetitionSarah M Ackerman, G Matthew Fricke, Joshua P Hecker , and 7 more authorsSep 2018
The Swarmathon is a swarm robotics programming challenge that engages college students from minority-serving institutions in NASA’s Journey to Mars. Teams compete by programming a group of robots to search for, pick up, and drop off resources in a collection zone. The Swarmathon produces prototypes for robot swarms that would collect resources on the surface of Mars. Robots operate completely autonomously with no global map, and each team’s algorithm must be sufficiently flexible to effectively find resources from a variety of unknown distributions. The Swarmathon includes Physical and Virtual Competitions. Physical competitors test their algorithms on robots they build at their schools; they then upload their code to run autonomously on identical robots during the three day competition in an outdoor arena at Kennedy Space Center. Virtual competitors complete an identical challenge in simulation. Participants mentor local teams to compete in a separate High School Division. In the first 2 years, over 1,100 students participated. 63% of students were from underrepresented ethnic and racial groups. Participants had significant gains in both interest and core robotic competencies that were equivalent across gender and racial groups, suggesting that the Swarmathon is effectively educating a diverse population of future roboticists.