When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task

Tests or test batteries used for assessing motor skills, either in research studies or in clinical settings, apply a variety of procedures for scoring performances, including everything from one to ten attempts, of which the best is scored or an average is computed. The rationale behind scoring proc...

Full description

Bibliographic Details
Main Authors: Arve Vorland Pedersen, Håvard Lorås
Format: Article
Language:English
Published: Frontiers Media S.A. 2017-04-01
Series:Frontiers in Psychology
Subjects:
Online Access:http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00619/full
id doaj-adf05204fa0345eea7fd8ea271f87f8a
record_format Article
spelling doaj-adf05204fa0345eea7fd8ea271f87f8a2020-11-24T21:26:08ZengFrontiers Media S.A.Frontiers in Psychology1664-10782017-04-01810.3389/fpsyg.2017.00619213731When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill TaskArve Vorland PedersenHåvard LoråsTests or test batteries used for assessing motor skills, either in research studies or in clinical settings, apply a variety of procedures for scoring performances, including everything from one to ten attempts, of which the best is scored or an average is computed. The rationale behind scoring procedures is rarely stated, and it seems that the number of attempts allowed is decided without much qualification from research. It is uncertain whether procedures fairly capture an individual’s skill level. Thus, the validity of the tests may be compromised. The present study tested 24 young female soccer players on the juggling of a soccer ball. They were given 10 attempts, and trials were scored according to nine different procedures including the ‘best of’ or ‘mean of’ either one, two, three, five, or ten attempts. Individual raw scores differed widely across trials, but no general effect of trials was found. The mean (SD) percentage difference between the lowest and highest scores was 27.7(9.9)%, with 17 players (71%) demonstrating a significant change from lowest to highest score. Correlations between raw scores were low across trials, while they were generally higher across scoring procedures. The first trial was significantly different from the remaining both as a raw score and as scoring procedure. The mean percentage difference between best-of-two and best-of-ten scores was 95%, with 50 % of the players demonstrating a significant difference between the two scoring procedures. No significant differences were found across mean-of-rule scorings. Best-of-rule and mean-of-rule scorings were significantly different except for the best-of-two vs. mean-of-two. The mean difference between highest and lowest rank across players was 6.7 (3.6), with individual rankings within the group varying 33% on average across procedures. One player moved from 3rd to 23rd place because of procedural differences. Therefore, it is concluded that scoring procedures affect results and may have an impact on test outcomes. This may present consequences for decision-making from test results, such as diagnosing and selection of intervention groups. We hope that our results would inspire further research into the scoring procedures of the vast amount of tests and tasks in common use.http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00619/fullassessmentperformance
collection DOAJ
language English
format Article
sources DOAJ
author Arve Vorland Pedersen
Håvard Lorås
spellingShingle Arve Vorland Pedersen
Håvard Lorås
When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task
Frontiers in Psychology
assessment
performance
author_facet Arve Vorland Pedersen
Håvard Lorås
author_sort Arve Vorland Pedersen
title When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task
title_short When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task
title_full When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task
title_fullStr When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task
title_full_unstemmed When Is a Test Score Fair for the Individual Who Is Being Tested? Effects of Different Scoring Procedures across Multiple Attempts When Testing a Motor Skill Task
title_sort when is a test score fair for the individual who is being tested? effects of different scoring procedures across multiple attempts when testing a motor skill task
publisher Frontiers Media S.A.
series Frontiers in Psychology
issn 1664-1078
publishDate 2017-04-01
description Tests or test batteries used for assessing motor skills, either in research studies or in clinical settings, apply a variety of procedures for scoring performances, including everything from one to ten attempts, of which the best is scored or an average is computed. The rationale behind scoring procedures is rarely stated, and it seems that the number of attempts allowed is decided without much qualification from research. It is uncertain whether procedures fairly capture an individual’s skill level. Thus, the validity of the tests may be compromised. The present study tested 24 young female soccer players on the juggling of a soccer ball. They were given 10 attempts, and trials were scored according to nine different procedures including the ‘best of’ or ‘mean of’ either one, two, three, five, or ten attempts. Individual raw scores differed widely across trials, but no general effect of trials was found. The mean (SD) percentage difference between the lowest and highest scores was 27.7(9.9)%, with 17 players (71%) demonstrating a significant change from lowest to highest score. Correlations between raw scores were low across trials, while they were generally higher across scoring procedures. The first trial was significantly different from the remaining both as a raw score and as scoring procedure. The mean percentage difference between best-of-two and best-of-ten scores was 95%, with 50 % of the players demonstrating a significant difference between the two scoring procedures. No significant differences were found across mean-of-rule scorings. Best-of-rule and mean-of-rule scorings were significantly different except for the best-of-two vs. mean-of-two. The mean difference between highest and lowest rank across players was 6.7 (3.6), with individual rankings within the group varying 33% on average across procedures. One player moved from 3rd to 23rd place because of procedural differences. Therefore, it is concluded that scoring procedures affect results and may have an impact on test outcomes. This may present consequences for decision-making from test results, such as diagnosing and selection of intervention groups. We hope that our results would inspire further research into the scoring procedures of the vast amount of tests and tasks in common use.
topic assessment
performance
url http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00619/full
work_keys_str_mv AT arvevorlandpedersen whenisatestscorefairfortheindividualwhoisbeingtestedeffectsofdifferentscoringproceduresacrossmultipleattemptswhentestingamotorskilltask
AT havardloras whenisatestscorefairfortheindividualwhoisbeingtestedeffectsofdifferentscoringproceduresacrossmultipleattemptswhentestingamotorskilltask
_version_ 1725980750697725952