## Data Assimilation

Data assimilation, also known as state estimation, may be defined as the process of combining a model with observational data to provide an estimate of the state of the system which is better than could be obtained using just the data or the model alone. Such products are necessarily not wholly accurate representations of the system; however, especially in data-sparse regions of the globe and for poorly measured fields, the resulting combined product is likely to be a much more accurate representation of the system than could be achieved using only the raw data alone. (Of course, this in turn means the products contain biases introduced by whatever model is used.)

The process of combining data and model has grown increasingly sophisticated over the years, beginning with optimal-interpolation and three-dimensional variational data assimilation (3D-Var). Currently, most centers use 4D-Var (with time as an additional variable) and ensemble Kalman filter methods. All these methods are essentially least-squares methods or variations of least-squares methods, with the final estimate being chosen to minimize the appropriately chosen ‘distance’ between the final estimate, the data and a model prior. The difference between the various methods lies in the choice of the metric used to measure distance and the corresponding weight given to the observations and the prior estimate, and in the choice of which fields or parameters are allowed to be adjusted in order to produce the final estimate. Modern methods generally allow the error fields to evolve in some fashion, so allowing a dynamic estimation of the error covariances and a better estimate of the appropriate weights. The literature is extensive; see Kalnay (2003) and Wunsch (2006) for reviews.

There is a long history of data assimilation in meteorology, largely associated with weather forecasts: data from satellites, radio-sondes and other sources are combined with a model estimate from a previous forecast, to provide the initial state for a subsequent forecast. Re-analysis products are now available that combine model and data over the last fifty or so years into a single, consistent product. The application of data assimilation methodology in oceanography is more recent, reflecting in part the relative sparsity of data in the ocean and so the likely large errors inherent in any such analysis. However, with the advent of near-routine observations from satellites (e.g., altimeters) and profiler drifters (e.g., the Argo float system), a much higher density of observations is possible and ocean data assimilation has become a practical proposition. This is important because it is the initial state of the ocean, and not the atmosphere, that will largely determine the evolution of the ocean and the climate on the decadal and longer timescales, and so determining the natural variability of the climate. Still more recently, inverse modeling, an optimization technique closely related to data assimilation, has been applied to oceanic and terrestrial biogeochemical fields – for example, CO2, CH4, and CO – in an attempt to constrain the carbon budget. Here, the field is still in its infancy, and the term ‘data assimilation’ might imply capabilities that do not yet exist, but that one may hope will exist at the end of the proposal period.

Many of these activities will be conducted in collaboration with GFDL scientists, and in particular activities involving the ocean state estimation and ocean initialization for decadal to centennial predictions (the next subsection) will be carried out largely with postdoctoral research fellows working closely and in collaboration with GFDL scientists. Analogous to the development of ESMs, data assimilation in ESMs has many components, and we propose to focus on a subset of these, as follows.

- Ocean data assimilation in climate models. By estimating the ocean state using all available data (including ARGO, altimetry, and hydrographic), the detection and prediction of climate change and variability on decadal timescales is enabled.
- High resolution ocean data assimilation, both to gain experience in this activity for the next generation of ocean climate models and for present-day regional models.
- Ocean tracer inversions for determining water mass properties, pathways, and sources and sinks of biogeochemical tracers, and to evaluate the ocean component of the Earth system model.
- Analyzing satellite observations of ocean color to elucidate ocean ecosystem processes.
- Atmospheric inversions for estimating high-quality, time-dependent flux maps of CO2, CH4, and CO to the atmosphere from tracer observations in the atmosphere and oceans, to evaluate the sources of these tracers and elucidate source dynamics.
- Use of terrestrial ecosystem carbon dynamics to evaluate carbon fluxes and help evaluate ESM parameter variability.

The overall goal of this activity is to collaborate with GFDL to create a capability whereby data can be combined with an Earth system model to provide a better assessment of the state of the current Earth system, and that can be used to provide forecasts of the future state of the system.