Annual Report [2018]

Simons Collaborative Marine Atlas Project

Life Sciences

As planning for the Simons Collaboration on Computational Biogeochemical Modeling of Marine Ecosystems (CBIOMES) took off in January 2017, one need quickly became apparent: a database with tools that would allow the project’s participants to sift through the mountains of oceanographic data collected from their own work and by other initiatives.

Oceanographic data come in all shapes and sizes: Satellites and numerical models generate enormous datasets, cruises collect water samples of microbes, and sensors measure things such as temperature and salinity across a wide swath of the planet’s oceans. Combined, these datasets offer rich possibilities for new discoveries and hypotheses. 

But as of early 2017, many of the datasets were not combined. Oceanographic datasets were scattered across a wide range of sources, and using them often required enormous downloads. And each dataset had its own unique internal organization, making the task of comparing information across different datasets a confusing and laborious process. 

The atlas contains more than 10 terabytes of data, Armbrust says, all of it marked with location and time stamps to make for easy comparisons.

As a result, researchers who wanted to work across different datasets had to keep reinventing the wheel. “It’s this ongoing issue that researchers are forced to solve again and again,” says Mohammad Ashkezari, a research scientist at the University of Washington in the laboratory of Virginia Armbrust, one of CBIOMES’ principal investigators.

To address this issue, Armbrust’s lab has created the Simons Collaborative Marine Atlas Project (CMAP) — an open database that merges CBIOMES data with publicly available datasets from satellites and sensors and, more recently, all the other oceanographic research initiatives supported by the Simons Foundation. The atlas contains more than 10 terabytes of data, Armbrust says, all of it marked with location and time stamps to make for easy comparisons. Cleaning up the data has been a “major undertaking,” she says.

Already, researchers at the University of Hawai‘i have used the atlas to validate a hypothesis about how the distribution of a particular gene in the ocean correlates with the available nutrients. And Armbrust has used it to compare satellite data on the distribution of chlorophyll with measurements of physical features of the ocean. “Within moments, I could start seeing whether there was a correlation,” she says.

Jesse McNichol, a postdoctoral researcher at the University of Southern California in Los Angeles in the laboratory of Jed Fuhrman, a CBIOMES principal investigator, plans to use CMAP to study how the abundance of particular types of bacteria and archaea correlates with variables such as ocean temperature and nutrients. Using new algorithms for denoising genetic data, McNichol has worked with the Armbrust lab to prepare huge amounts of genetic information about ocean microbes for inclusion in the atlas. In the future, this will include samples from 2003 and 2016 that cover the Pacific Ocean from Alaska to New Zealand. “We can directly compare datasets that are 13 years apart, across the entire ocean,” McNichol says. “Then that data will be out there and accessible to anyone.”

The Simons Collaborative Marine Atlas Project integrates data from a wide range of sources, including satellites, research cruises and submersibles. Photo courtesy of Mohammad Ashkezari

In its first months after launch, the atlas was made available only to a few research groups for test runs. But in December 2018, Armbrust’s team unveiled an early version of the system at the annual meeting of the Simons Collaboration on Ocean Processes and Ecology. “Eventually, we hope people in the broader community will use it,” says Marian Carlson, director of the Simons Foundation’s Life Sciences division.

This early version includes online documentation and applications for Windows and Macintosh computers in which researchers can designate the ocean region, time range and type of data they wish to examine, and then download only the data relevant to their query. The application also provides built-in data visualization tools and will eventually include a portal allowing researchers to upload their own data to the archive.

Armbrust’s team has moved with amazing dispatch, says Michael Follows, an oceanographer at the Massachusetts Institute of Technology and director of CBIOMES. “I imagined it would take at least a year more than it has before we would see a working ocean atlas,” he says. “I’m astonished that we’re already there.” 

Armbrust still considers the project to be in its early stages, but its potential is already clear, she says. “Every time we do a demo to people, they’re kind of blown away,” she says. “It allows you to do a little dreaming about the kinds of questions you might ask.”