On the verification of climate reconstructions
The skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores....
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2007-07-01
|
Series: | Climate of the Past |
Online Access: | http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf |
id |
doaj-4a5db27263f84dcb86ab5372af28efdb |
---|---|
record_format |
Article |
spelling |
doaj-4a5db27263f84dcb86ab5372af28efdb2020-11-24T22:49:47ZengCopernicus PublicationsClimate of the Past1814-93241814-93322007-07-0133397409On the verification of climate reconstructionsG. BürgerThe skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores. Among the methods are multiple regression, multiple inverse regression, total least squares, RegEM, all considered with and without variance matching. For all of them the scores show considerable variation, but previous estimates, such as a 50% reduction of error (<i>RE</i>), appear as outliers and more realistic estimates vary about 25%. It is shown that the overestimation of skill is possible in the presence of strong persistence (trends). In that case, the classical "early" or "late" calibration sets are not representative for the intended (instrumental, millennial) domain. As a consequence, <i>RE</i> scores are generally inflated, and the proxy predictions are easily outperformed by stochastic, a priori skill-less predictions. <br><br> To obtain robust significance levels the multi-crossvalidation is repeated using stochastic predictors. Comparing the score distributions it turns out that the proxies perform significantly better for almost all methods. The scores of the stochastic predictors do not vanish, nonetheless, with an estimated 10% of spurious skill based on representative samples. I argue that this residual score is due to the limited sample size of 130 years, where the memory of the processes degrades the independence of calibration and validation sets. It is likely that proxy prediction scores are similarly inflated and have to be downgraded further, leading to a final overall skill that for the best methods lies around 20%. <br><br> The consequences of the limited verification skill for millennial reconstructions is briefly discussed. http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
G. Bürger |
spellingShingle |
G. Bürger On the verification of climate reconstructions Climate of the Past |
author_facet |
G. Bürger |
author_sort |
G. Bürger |
title |
On the verification of climate reconstructions |
title_short |
On the verification of climate reconstructions |
title_full |
On the verification of climate reconstructions |
title_fullStr |
On the verification of climate reconstructions |
title_full_unstemmed |
On the verification of climate reconstructions |
title_sort |
on the verification of climate reconstructions |
publisher |
Copernicus Publications |
series |
Climate of the Past |
issn |
1814-9324 1814-9332 |
publishDate |
2007-07-01 |
description |
The skill of proxy-based reconstructions of Northern hemisphere temperature is reassessed. Using an almost complete set of proxy and instrumental data of the past 130 years a multi-crossvalidation is conducted of a number of statistical methods, producing a distribution of verification skill scores. Among the methods are multiple regression, multiple inverse regression, total least squares, RegEM, all considered with and without variance matching. For all of them the scores show considerable variation, but previous estimates, such as a 50% reduction of error (<i>RE</i>), appear as outliers and more realistic estimates vary about 25%. It is shown that the overestimation of skill is possible in the presence of strong persistence (trends). In that case, the classical "early" or "late" calibration sets are not representative for the intended (instrumental, millennial) domain. As a consequence, <i>RE</i> scores are generally inflated, and the proxy predictions are easily outperformed by stochastic, a priori skill-less predictions. <br><br> To obtain robust significance levels the multi-crossvalidation is repeated using stochastic predictors. Comparing the score distributions it turns out that the proxies perform significantly better for almost all methods. The scores of the stochastic predictors do not vanish, nonetheless, with an estimated 10% of spurious skill based on representative samples. I argue that this residual score is due to the limited sample size of 130 years, where the memory of the processes degrades the independence of calibration and validation sets. It is likely that proxy prediction scores are similarly inflated and have to be downgraded further, leading to a final overall skill that for the best methods lies around 20%. <br><br> The consequences of the limited verification skill for millennial reconstructions is briefly discussed. |
url |
http://www.clim-past.net/3/397/2007/cp-3-397-2007.pdf |
work_keys_str_mv |
AT gburger ontheverificationofclimatereconstructions |
_version_ |
1725675108983373824 |