Summary: | The main purpose of this study was to investigate the
suitability of reference tests for moderating internally
assessed national qualifications at the upper secondary school
level. In a secondary analysis, the relative merits of two
alternative item formats, the open-ended and cloze, were
compared with the multiple-choice item, which has been traditionally
utilized in reference tests of this nature.
A series of short reference tests, based on the underlying
construct of developed abilities, were constructed in four core
subject areas (i.e. English, mathematics, science and social
studies), along with an additional test of scholastic aptitude.
The English, science and social studies tests consisted of a
vocabulary and reading comprehension component; while the
mathematics test had a more traditional content, relating to
the measure of general concepts. An essay test was added to the
English test analyses. Multiple forms of the developed abilities
tests were developed as separate multiple-choice and open-ended/
cloze formats, and to enable a multiple matrix sampling technique
to be employed.
The validity of the reference tests was evaluated by using
the performance of Christchurch fifth formers on the tests to predict
their corresponding School Certificate Examination class parameters
(i.e. mean and standard deviation). These analyses were based on
a sample of 18 classes, across four state, co-educational high
schools; covering a wide range of ability levels. A series of
mUltiple regression analyses were conducted to provide optimal
predictions of the respective class parameters.
It was found that each of the subject-based reference tests
predicted class ability levels (i.e. means) on the corresponding
School Certificate Examinations with a very high degree of
sensitivity. The multiple-R's generated were 0.97 for mathematics,
0.90 for English, 0.89 for science and 0.80 for social studies.
The predictions of the spread of ability for each class (i.e.
standard deviation) were found to be more difficult, although
the results were still sensitive enough for moderation purposes.
The addition of the mathematics or scholastic aptitude
test to the subject-based reference tests improved the multiple-R's
on both parameters.
The comparison of item types revealed no significant difference
in the prediction of class means. However, the open-ended/
cloze format failed to predict class standard deviations at a
statistically significant level.
The findings were discussed with reference to those
from earlier studies, to policy implications, to application at
a practical level and to the urgent need for further research.
|