On the verification of climate reconstructions

The skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores....

Full description

Bibliographic Details
Main Author: G. Bürger
Format: Article
Language:English
Published: Copernicus Publications 2007-07-01
Series:Climate of the Past
Online Access:http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf
id doaj-4a5db27263f84dcb86ab5372af28efdb
record_format Article
spelling doaj-4a5db27263f84dcb86ab5372af28efdb2020-11-24T22:49:47ZengCopernicus PublicationsClimate of the Past1814-93241814-93322007-07-0133397409On the verification of climate reconstructionsG. BürgerThe skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores. Among the methods are multiple regression, multiple inverse regression, total least squares, RegEM, all considered with and without variance matching. For all of them the scores show considerable variation, but previous estimates, such as a 50% reduction of error (<i>RE</i>), appear as outliers and more realistic estimates vary about 25%. It is shown that the overestimation of skill is possible in the presence of strong persistence (trends). In that case, the classical "early" or "late" calibration sets are not representative for the intended (instrumental, millennial) domain. As a consequence, <i>RE</i> scores are generally inflated, and the proxy predictions are easily outperformed by stochastic, a priori skill-less predictions. <br><br> To obtain robust significance levels the multi-crossvalidation is repeated using stochastic predictors. Comparing the score distributions it turns out that the proxies perform significantly better for almost all methods. The scores of the stochastic predictors do not vanish, nonetheless, with an estimated 10% of spurious skill based on representative samples. I argue that this residual score is due to the limited sample size of 130 years, where the memory of the processes degrades the independence of calibration and validation sets. It is likely that proxy prediction scores are similarly inflated and have to be downgraded further, leading to a final overall skill that for the best methods lies around 20%. <br><br> The consequences of the limited verification skill for millennial reconstructions is briefly discussed. http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf
collection DOAJ
language English
format Article
sources DOAJ
author G. Bürger
spellingShingle G. Bürger
On the verification of climate reconstructions
Climate of the Past
author_facet G. Bürger
author_sort G. Bürger
title On the verification of climate reconstructions
title_short On the verification of climate reconstructions
title_full On the verification of climate reconstructions
title_fullStr On the verification of climate reconstructions
title_full_unstemmed On the verification of climate reconstructions
title_sort on the verification of climate reconstructions
publisher Copernicus Publications
series Climate of the Past
issn 1814-9324
1814-9332
publishDate 2007-07-01
description The skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores. Among the methods are multiple regression, multiple inverse regression, total least squares, RegEM, all considered with and without variance matching. For all of them the scores show considerable variation, but previous estimates, such as a 50% reduction of error (<i>RE</i>), appear as outliers and more realistic estimates vary about 25%. It is shown that the overestimation of skill is possible in the presence of strong persistence (trends). In that case, the classical "early" or "late" calibration sets are not representative for the intended (instrumental, millennial) domain. As a consequence, <i>RE</i> scores are generally inflated, and the proxy predictions are easily outperformed by stochastic, a priori skill-less predictions. <br><br> To obtain robust significance levels the multi-crossvalidation is repeated using stochastic predictors. Comparing the score distributions it turns out that the proxies perform significantly better for almost all methods. The scores of the stochastic predictors do not vanish, nonetheless, with an estimated 10% of spurious skill based on representative samples. I argue that this residual score is due to the limited sample size of 130 years, where the memory of the processes degrades the independence of calibration and validation sets. It is likely that proxy prediction scores are similarly inflated and have to be downgraded further, leading to a final overall skill that for the best methods lies around 20%. <br><br> The consequences of the limited verification skill for millennial reconstructions is briefly discussed.
url http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf
work_keys_str_mv AT gburger ontheverificationofclimatereconstructions
_version_ 1725675108983373824