We are a home for Earth science data and computing professionals. Our sessions bring together the community for hands-on, interdisciplinary deep dives as we explore "Innovation to Impact" this year. Learn more about ESIP: esipfed.org
Log in to bookmark your favorites and sync them to your phone or calendar.
The Sustainable Data Management Cluster is transitioning into the Data Sovereignty Cluster following on the work of the past few years culminating in the recent CARE principles paper (https://doi.org/10.5334/dsj-2024-037). While the current Sustainable Data Management Cluster will go dormant the new Data Sovereignty Cluster is looking for new faces and ideas and would like to hear from you on your interests and questions related to data sovereignty.
Value to Session Participants: Awareness of the cluster, interest in potential paths by the Data Sovereignty Cluster
Recommended Ways to Prepare: Reading about data sovereignty/indigenous data, realizing that data sovereignty is not only indigenous data
NASA’s ESDIS program manages the Distributed Active Archive Centers (DAACs), data repositories that process, archive, document and distribute science data from NASA's past and current Earth-observing satellites and field measurement programs. Although DAACs have historically focused on the unique needs of their discipline communities, current efforts at ESDIS focus on providing the excellent level of service that DAACs provide to all users through enterprise and cross-DAAC collaborative efforts that will shape the evolution of a unified system that meets the needs of all users of NASA Earth observation data through shared standards, a shared website and common tools. While these efforts require engineering expertise, it is also important that the needs of science and applied science users are taken into account during development. This session will describe efforts currently underway at ESDIS to ensure that user needs and user experience are considered during ESDIS evolution while protecting both ESDIS’s high standards and the trust of the scientific community.
Value to Session Participants: We want participants to have a better idea of what is happening at ESDIS and how they might provide feedback.
Recommended Ways to Prepare: Look at the new Earthdata
ESIP Data Readiness Cluster is exploring to expand the current AI-readiness checklist (https://github.com/esipfed/data-readiness) by better supporting diverse geoscience use cases. The current AI-readiness checklist provides general guidelines for data producers and data managers to evaluate the quality and usability of open environmental data for AI developments. However, different AI use cases in geosciences may have further requirements for datasets that are suitable for efficient and reproducible AI research and development. This session will focus on the development of extensions of AI-readiness checklist for different types of geoscience datasets.
Value to Session Participants: Session participants can contribute to the development and provide feedback to the AI-readiness checklist.
Recommended Ways to Prepare: Getting familiar with the AI-readiness checklist from ESIP Data Readiness Cluster.
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
Tuesday January 21, 2025 1:30pm - 3:00pm EST
Room 1
Writing code has become an integral component of conducting scientific research, especially as datasets size and complexity have grown. Scientists use code to download and clean data, prepare visualizations, calculate statistics, run models, and more. Just as the use of code in environmental research has grown, so has the ecosystem of tools and techniques for building robust analyses. Libraries in the programming language R have expanded to meet the needs of a growing user base in the scientific community, particularly through the open science community, rOpenSci. In this session, we will focus on a particular package in the rOpenSci ecosystem called ‘targets’, which enables users to build robust, data pipelines that enable reproducible and efficient scientific workflows. We will introduce the concepts of dependency tracking that underpin the package, host an interactive demo to build a small pipeline using ‘targets’, and share a few examples of ‘targets’ pipelines built for large research projects. Attendees should leave this session feeling inspired and equipped to begin constructing data pipelines using ‘targets’ for their own projects.
Value to Session Participants: Session participants will leave with an example of a reproducible workflow, and practice writing and running code with dependency management enabled. This should give them a starting point for future projects that can leverage these techniques.
Recommended Ways to Prepare: Skim the homepage of the ‘targets’ rOpenSci docs to understand the high-level summary and philosophy of this approach at https://docs.ropensci.org/targets/. Consider watching the 4-minute demonstration video that shows an example workflow. If you are not familiar with R functions, please read about them in this R for Data Science chapter at https://r4ds.hadley.nz/functions.html.
We will talk on a range of ideas centering on data, AI models, and AI product life cycles. We will determine what we should do next to help community realize practical, trustworthy, and ethical AI.
Value to Session Participants: Clear the mind on what AI projects should be carried out, participate to draft the community paper, be part of the big effort on navigating AI efforts in Earth Science Data community.
Recommended Ways to Prepare: Read the AI readiness checklist and the meeting notes of machine learning cluster.
The Biological Data Standards Cluster has created accompanying guidelines to the Biological Data Standards Primer that provide more context and details for data managers. The goal of these guidelines is to bridge the gap between the Primer and the full, lengthy standards documentation, by giving the information necessary to help data managers decide which standards should be used for the biological data they are working with. Currently these guides are stored in a repository in the ESIP GitHub. Interacting with, or contributing to, the GitHub repository might not be intuitive or standard practice for all users of the guidelines. In this session, we will introduce some GitHub basics for users, and using GitHub as a tool, we want to encourage the ESIP community to provide structured feedback and ensure the guidelines are accurate and aligned with current standards for biological data. Value to Session Participants: For those newer to GitHub, this session will provide an environment to learn how to make a fork, submit a pull request, and submit issues. Recommended Ways to Prepare: Review the Biological Data Standards Primer