Data Science

At Lamont-Doherty Earth Observatory


Earth Science has recently experienced an explosion in the volume and complexity of data, due to the rapid growth of new sensory systems, satellite observations, and numerical simulations of Earth system processes. In order to leverage these data to confront the big questions in Earth Science, scientists at Lamont Doherty Earth Observatory are pushing the boundaries of statistical analysis, data visualization, and scientific computing.

Below is small sample of some of the data-science related activities happening at Lamont.

Climate Modeling and Diagnostics Group

The Climate Modeling and Diagnostics Group uses simulations and data to investigate variability and change in Earth’s global climate system. The group has developed sophisticated tools for interactively analyzing and visualizing the petabytes of data produced by the Coupled Model Intercomparison Project. Their work is helping shed light on the driving causes of changes in rainfall and drought throughout the world.

The Climate Modeling and Diagnostics group has also developed sophisticated statistical approaches to prediction and predictability of phenomena such as El Niño Southern Oscillation (ENSO). These statistical approaches have broad applicability across data science


Seismologist at Lamont such as Felix Waldhauser continuously collect and process data from thousands of seismometers, producing near-real-time analyses of earthquakes and related hazards.


Marine biologists such as Sonya Dyhrman use genomic sequencing of marine microbes to probe the secrets of marine algae.


Lamont is one of the leading providers of open, interactive, web-accessible, databases of Earth Science data. Many of these databases are part of the Interdisciplinary Earth Data Alliance (IEDA), a community-based data facility funded by the US National Science Foundation to support, sustain, and advance the geosciences by providing data services for observational solid earth data from the Ocean, Earth, and Polar Sciences.

Some examples of databases include:

  • The Academic Seismic Portal (ASP): The Academic Seismic Portal (ASP) is sponsored by the National Science Foundation to organize active source seismic field data into a modern relational data management system accessible through the Internet.
  • The IRI/LDEO Climate Data Library: The International Research Institute Data Library is a powerful and freely accessible online data repository and analysis tool that allows a user to view, analyze, and download hundreds of terabytes of climate-related data through a standard web browser.
  • The Socioeconomic Data and Applications Center (SEDAC): SEDAC is one of the Distributed Active Archive Centers (DAACs) in the Earth Observing System Data and Information System (EOSDIS) of the U.S. National Aeronautics and Space Administration. Focusing on human interactions in the environment, SEDAC has as its mission to develop and operate applications that support the integration of socioeconomic and earth science data and to serve as an “Information Gateway” between earth sciences and social sciences. Hosted by the Center for International Earth Science Information Network (CIESN).


The data science activities at Lamont have evolved organically, driven by the scientific priorities and expertise of individuals and small research groups. The future will bring a more coordinated strategy, in which we leverage shared expertise and resources to create a new, integrated environment for data-intensive science.

The Lamont RTE Data Science Center will enable LDEO scientists to effortlessly visualize and analyze large, complex environmental datasets.
 To realize this vision, we will

  • Recruit a team of top data scientists and software engineers
  • Build a new type of computing cluster focused on high-throughput, I/O bound data workflows
  • Leverage and contribute to open-source big-data scientific software stack
  • Partner with computing experts in Columbia’s Data Science Institute