On the verification of climate reconstructions

The skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores....

Full description

Bibliographic Details
Main Author:	G. Bürger
Format:	Article
Language:	English
Published:	Copernicus Publications 2007-07-01
Series:	Climate of the Past
Online Access:	http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf

id	doaj-4a5db27263f84dcb86ab5372af28efdb
record_format	Article
spelling	doaj-4a5db27263f84dcb86ab5372af28efdb2020-11-24T22:49:47ZengCopernicus PublicationsClimate of the Past1814-93241814-93322007-07-0133397409On the verification of climate reconstructionsG. BürgerThe skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores. Among the methods are multiple regression, multiple inverse regression, total least squares, RegEM, all considered with and without variance matching. For all of them the scores show considerable variation, but previous estimates, such as a 50% reduction of error (<i>RE</i>), appear as outliers and more realistic estimates vary about 25%. It is shown that the overestimation of skill is possible in the presence of strong persistence (trends). In that case, the classical "early" or "late" calibration sets are not representative for the intended (instrumental, millennial) domain. As a consequence, <i>RE</i> scores are generally inflated, and the proxy predictions are easily outperformed by stochastic, a priori skill-less predictions. <br><br> To obtain robust significance levels the multi-crossvalidation is repeated using stochastic predictors. Comparing the score distributions it turns out that the proxies perform significantly better for almost all methods. The scores of the stochastic predictors do not vanish, nonetheless, with an estimated 10% of spurious skill based on representative samples. I argue that this residual score is due to the limited sample size of 130 years, where the memory of the processes degrades the independence of calibration and validation sets. It is likely that proxy prediction scores are similarly inflated and have to be downgraded further, leading to a final overall skill that for the best methods lies around 20%. <br><br> The consequences of the limited verification skill for millennial reconstructions is briefly discussed. http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	G. Bürger
spellingShingle	G. Bürger On the verification of climate reconstructions Climate of the Past
author_facet	G. Bürger
author_sort	G. Bürger
title	On the verification of climate reconstructions
title_short	On the verification of climate reconstructions
title_full	On the verification of climate reconstructions
title_fullStr	On the verification of climate reconstructions
title_full_unstemmed	On the verification of climate reconstructions
title_sort	on the verification of climate reconstructions
publisher	Copernicus Publications
series	Climate of the Past
issn	1814-9324 1814-9332
publishDate	2007-07-01
description	The skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores. Among the methods are multiple regression, multiple inverse regression, total least squares, RegEM, all considered with and without variance matching. For all of them the scores show considerable variation, but previous estimates, such as a 50% reduction of error (<i>RE</i>), appear as outliers and more realistic estimates vary about 25%. It is shown that the overestimation of skill is possible in the presence of strong persistence (trends). In that case, the classical "early" or "late" calibration sets are not representative for the intended (instrumental, millennial) domain. As a consequence, <i>RE</i> scores are generally inflated, and the proxy predictions are easily outperformed by stochastic, a priori skill-less predictions. <br><br> To obtain robust significance levels the multi-crossvalidation is repeated using stochastic predictors. Comparing the score distributions it turns out that the proxies perform significantly better for almost all methods. The scores of the stochastic predictors do not vanish, nonetheless, with an estimated 10% of spurious skill based on representative samples. I argue that this residual score is due to the limited sample size of 130 years, where the memory of the processes degrades the independence of calibration and validation sets. It is likely that proxy prediction scores are similarly inflated and have to be downgraded further, leading to a final overall skill that for the best methods lies around 20%. <br><br> The consequences of the limited verification skill for millennial reconstructions is briefly discussed.
url	http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf
work_keys_str_mv	AT gburger ontheverificationofclimatereconstructions
_version_	1725675108983373824

On the verification of climate reconstructions

Similar Items