An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination

It was hypothesized that on an achievement test, items measuring complex cognitive objectives would exhibit a higher mean discrimination index — based on the whole test as criterion — than would an equal number of items measuring less complex cognitive objectives; and that the mean discrimination in...

Full description

Bibliographic Details
Main Author: McKie, Thomas Douglas Muir
Language:English
Published: University of British Columbia 2011
Subjects:
Online Access:http://hdl.handle.net/2429/39012
id ndltd-UBC-oai-circle.library.ubc.ca-2429-39012
record_format oai_dc
collection NDLTD
language English
sources NDLTD
topic Educational tests and measurements
spellingShingle Educational tests and measurements
McKie, Thomas Douglas Muir
An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
description It was hypothesized that on an achievement test, items measuring complex cognitive objectives would exhibit a higher mean discrimination index — based on the whole test as criterion — than would an equal number of items measuring less complex cognitive objectives; and that the mean discrimination index of these items would in turn be higher than that of the same number of still less complex items. The proviso was made that the difficulty indices of the items be similarly distributed within the several categories of items, hereafter called "relevance-categories," since discriminating power is related to difficulty. The categories selected were, from simplest to most complex, the Knowledge, Comprehension, and Application categories of Bloom's "Taxonomy of Educational Objectives." An achievement test was constructed, consisting of items in all three categories, and covering the content of two units of the British Columbia university-programme grade nine science course. A try-out of this test, on 200 students in two schools, permitted negatively discriminating items to be rejected and, in addition, provided difficulty indices for the remaining items. It was possible to match forty Knowledge items and forty Comprehension items very closely for difficulty; however, the mean difficulty of the Application items was so high that they could not be used in a test of the hypothesis without reducing numbers too drastically in all categories. Two "equivalent forms," matched for content, relevance-category, and difficulty were constructed from these eighty items and administered to 530 students in three schools. The reliability coefficient of the total test, estimated by correlating the sub-test scores and applying the Spearman-Brown formula, was .84; those of the Knowledge and Comprehension categories were similarly found to be .69 and .77, respectively. Revised difficulty indices, based on the new and larger sample, were calculated. Their distribution within the two relevance categories were found to be very similar, though not as closely matched as on the basis of the try-out test. For each item, the point-biserial coefficient of correlation between item and total score was computed — this being the selected index of discrimination — and Fisher's z-transformation was applied to produce measures with more nearly an equal-unit scale, in the hope that the parametric t-test could be used. However, the shapes of the resulting distributions were such that they could not be claimed to be samples from a normal population or populations. Accordingly, the t-test was rejected in favour of the non-parametric Mann-Whitney test of "no difference in median discrimination indices." The respective medians were .27 and .30, in terms of Fisher's z-values, but the difference proved to be non-significant at the pre-selected l%-level of significance. It was concluded that this experiment provided no grounds for accepting the hypothesis of the study. However, the actual probability of obtaining, in random sampling from a single population, a difference as large as that observed was only about .10; in addition, the results consistently favoured the Comprehension items, whose discrimination indices exceeded those of the Knowledge items at the extremes as well as at the mean. It was therefore suggested that if adequate testing time could be obtained, the use of larger numbers of items in all categories might increase test-reliability and possibly produce a significant result. Suggestions were advanced, based upon observations from the data, for refining the experiment and for further research. === Education, Faculty of === Graduate
author McKie, Thomas Douglas Muir
author_facet McKie, Thomas Douglas Muir
author_sort McKie, Thomas Douglas Muir
title An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_short An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_full An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_fullStr An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_full_unstemmed An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_sort investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
publisher University of British Columbia
publishDate 2011
url http://hdl.handle.net/2429/39012
work_keys_str_mv AT mckiethomasdouglasmuir aninvestigationoftherelationshipbetweentherelevancecategoryofachievementtestitemsandtheirindicesofdiscrimination
AT mckiethomasdouglasmuir investigationoftherelationshipbetweentherelevancecategoryofachievementtestitemsandtheirindicesofdiscrimination
_version_ 1718596304158851072
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-390122018-01-05T17:49:26Z An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination McKie, Thomas Douglas Muir Educational tests and measurements It was hypothesized that on an achievement test, items measuring complex cognitive objectives would exhibit a higher mean discrimination index — based on the whole test as criterion — than would an equal number of items measuring less complex cognitive objectives; and that the mean discrimination index of these items would in turn be higher than that of the same number of still less complex items. The proviso was made that the difficulty indices of the items be similarly distributed within the several categories of items, hereafter called "relevance-categories," since discriminating power is related to difficulty. The categories selected were, from simplest to most complex, the Knowledge, Comprehension, and Application categories of Bloom's "Taxonomy of Educational Objectives." An achievement test was constructed, consisting of items in all three categories, and covering the content of two units of the British Columbia university-programme grade nine science course. A try-out of this test, on 200 students in two schools, permitted negatively discriminating items to be rejected and, in addition, provided difficulty indices for the remaining items. It was possible to match forty Knowledge items and forty Comprehension items very closely for difficulty; however, the mean difficulty of the Application items was so high that they could not be used in a test of the hypothesis without reducing numbers too drastically in all categories. Two "equivalent forms," matched for content, relevance-category, and difficulty were constructed from these eighty items and administered to 530 students in three schools. The reliability coefficient of the total test, estimated by correlating the sub-test scores and applying the Spearman-Brown formula, was .84; those of the Knowledge and Comprehension categories were similarly found to be .69 and .77, respectively. Revised difficulty indices, based on the new and larger sample, were calculated. Their distribution within the two relevance categories were found to be very similar, though not as closely matched as on the basis of the try-out test. For each item, the point-biserial coefficient of correlation between item and total score was computed — this being the selected index of discrimination — and Fisher's z-transformation was applied to produce measures with more nearly an equal-unit scale, in the hope that the parametric t-test could be used. However, the shapes of the resulting distributions were such that they could not be claimed to be samples from a normal population or populations. Accordingly, the t-test was rejected in favour of the non-parametric Mann-Whitney test of "no difference in median discrimination indices." The respective medians were .27 and .30, in terms of Fisher's z-values, but the difference proved to be non-significant at the pre-selected l%-level of significance. It was concluded that this experiment provided no grounds for accepting the hypothesis of the study. However, the actual probability of obtaining, in random sampling from a single population, a difference as large as that observed was only about .10; in addition, the results consistently favoured the Comprehension items, whose discrimination indices exceeded those of the Knowledge items at the extremes as well as at the mean. It was therefore suggested that if adequate testing time could be obtained, the use of larger numbers of items in all categories might increase test-reliability and possibly produce a significant result. Suggestions were advanced, based upon observations from the data, for refining the experiment and for further research. Education, Faculty of Graduate 2011-11-15T20:54:42Z 2011-11-15T20:54:42Z 1962 Text Thesis/Dissertation http://hdl.handle.net/2429/39012 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. University of British Columbia