NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE

The size of a training sample in Objective Bayesian Testing and Model Selection is a central problem in the theory and in the practice. We concentrate here in simulated training samples and in simple hypothesis. The striking result is that even in the simplest of situations, the optimal training sam...

Full description

Bibliographic Details
Main Authors: ISRAEL ALMODOVAR, RAÚL PERICCHI
Format: Article
Language:Spanish
Published: Universidad Nacional de Colombia, sede Medellín 2012-01-01
Series:Revista de la Facultad de Ciencias
Subjects:
Online Access:https://revistas.unal.edu.co/index.php/rfc/article/view/48975
id doaj-a9ee0fab281f49068c95dcdedd9a0bf0
record_format Article
spelling doaj-a9ee0fab281f49068c95dcdedd9a0bf02020-11-25T00:37:20ZspaUniversidad Nacional de Colombia, sede MedellínRevista de la Facultad de Ciencias0121-747X2357-55492012-01-011172239286NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULEISRAEL ALMODOVARRAÚL PERICCHIThe size of a training sample in Objective Bayesian Testing and Model Selection is a central problem in the theory and in the practice. We concentrate here in simulated training samples and in simple hypothesis. The striking result is that even in the simplest of situations, the optimal training sample M, can be minimal (for the identification of the sampling model) or maximal (for optimal prediction of future data). We suggest a compromise that seems to work well whatever the purpose of the analysis: the 5\% cubic root rule}}: M=min[0.05*n, sqrt{3}]{n}].  We proceed to define a comprehensive loss function that combines identification  errors and prediction errors, appropriately standardized. We find that the very  simple cubic root rule is extremely close to an over- all optimum for a wide selection  of sample sizes and cutting points that define the decision rules. The first time that  the cubic root has been proposed is in Pericchi (2010). This article propose to generalize  the rule and to take full statistical advantage for realistic situations. Another way to look  at the rule, is as a synthesis of the rationale that justify both AIC and BIC.https://revistas.unal.edu.co/index.php/rfc/article/view/489755% cubic root ruleintrinsec priorsobjective bayesian hypothesis testingtraining sample size
collection DOAJ
language Spanish
format Article
sources DOAJ
author ISRAEL ALMODOVAR
RAÚL PERICCHI
spellingShingle ISRAEL ALMODOVAR
RAÚL PERICCHI
NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE
Revista de la Facultad de Ciencias
5% cubic root rule
intrinsec priors
objective bayesian hypothesis testing
training sample size
author_facet ISRAEL ALMODOVAR
RAÚL PERICCHI
author_sort ISRAEL ALMODOVAR
title NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE
title_short NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE
title_full NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE
title_fullStr NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE
title_full_unstemmed NEW CRITERIA FOR THE CHOICE OF TRAINING SAMPLE SIZE FOR MODEL SELECTION AND PREDICTION: THE CUBIC ROOT RULE
title_sort new criteria for the choice of training sample size for model selection and prediction: the cubic root rule
publisher Universidad Nacional de Colombia, sede Medellín
series Revista de la Facultad de Ciencias
issn 0121-747X
2357-5549
publishDate 2012-01-01
description The size of a training sample in Objective Bayesian Testing and Model Selection is a central problem in the theory and in the practice. We concentrate here in simulated training samples and in simple hypothesis. The striking result is that even in the simplest of situations, the optimal training sample M, can be minimal (for the identification of the sampling model) or maximal (for optimal prediction of future data). We suggest a compromise that seems to work well whatever the purpose of the analysis: the 5\% cubic root rule}}: M=min[0.05*n, sqrt{3}]{n}].  We proceed to define a comprehensive loss function that combines identification  errors and prediction errors, appropriately standardized. We find that the very  simple cubic root rule is extremely close to an over- all optimum for a wide selection  of sample sizes and cutting points that define the decision rules. The first time that  the cubic root has been proposed is in Pericchi (2010). This article propose to generalize  the rule and to take full statistical advantage for realistic situations. Another way to look  at the rule, is as a synthesis of the rationale that justify both AIC and BIC.
topic 5% cubic root rule
intrinsec priors
objective bayesian hypothesis testing
training sample size
url https://revistas.unal.edu.co/index.php/rfc/article/view/48975
work_keys_str_mv AT israelalmodovar newcriteriaforthechoiceoftrainingsamplesizeformodelselectionandpredictionthecubicrootrule
AT raulpericchi newcriteriaforthechoiceoftrainingsamplesizeformodelselectionandpredictionthecubicrootrule
_version_ 1725301313878622208