An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination

It was hypothesized that on an achievement test, items measuring complex cognitive objectives would exhibit a higher mean discrimination index — based on the whole test as criterion — than would an equal number of items measuring less complex cognitive objectives; and that the mean discrimination in...

Full description

Bibliographic Details
Main Author:	McKie, Thomas Douglas Muir
Language:	English
Published:	University of British Columbia 2011
Subjects:	Educational tests and measurements
Online Access:	http://hdl.handle.net/2429/39012

id	ndltd-UBC-oai-circle.library.ubc.ca-2429-39012
record_format	oai_dc
collection	NDLTD
language	English
sources	NDLTD
topic	Educational tests and measurements
spellingShingle	Educational tests and measurements McKie, Thomas Douglas Muir An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
description	It was hypothesized that on an achievement test, items measuring complex cognitive objectives would exhibit a higher mean discrimination index — based on the whole test as criterion — than would an equal number of items measuring less complex cognitive objectives; and that the mean discrimination index of these items would in turn be higher than that of the same number of still less complex items. The proviso was made that the difficulty indices of the items be similarly distributed within the several categories of items, hereafter called "relevance-categories," since discriminating power is related to difficulty. The categories selected were, from simplest to most complex, the Knowledge, Comprehension, and Application categories of Bloom's "Taxonomy of Educational Objectives." An achievement test was constructed, consisting of items in all three categories, and covering the content of two units of the British Columbia university-programme grade nine science course. A try-out of this test, on 200 students in two schools, permitted negatively discriminating items to be rejected and, in addition, provided difficulty indices for the remaining items. It was possible to match forty Knowledge items and forty Comprehension items very closely for difficulty; however, the mean difficulty of the Application items was so high that they could not be used in a test of the hypothesis without reducing numbers too drastically in all categories. Two "equivalent forms," matched for content, relevance-category, and difficulty were constructed from these eighty items and administered to 530 students in three schools. The reliability coefficient of the total test, estimated by correlating the sub-test scores and applying the Spearman-Brown formula, was .84; those of the Knowledge and Comprehension categories were similarly found to be .69 and .77, respectively. Revised difficulty indices, based on the new and larger sample, were calculated. Their distribution within the two relevance categories were found to be very similar, though not as closely matched as on the basis of the try-out test. For each item, the point-biserial coefficient of correlation between item and total score was computed — this being the selected index of discrimination — and Fisher's z-transformation was applied to produce measures with more nearly an equal-unit scale, in the hope that the parametric t-test could be used. However, the shapes of the resulting distributions were such that they could not be claimed to be samples from a normal population or populations. Accordingly, the t-test was rejected in favour of the non-parametric Mann-Whitney test of "no difference in median discrimination indices." The respective medians were .27 and .30, in terms of Fisher's z-values, but the difference proved to be non-significant at the pre-selected l%-level of significance. It was concluded that this experiment provided no grounds for accepting the hypothesis of the study. However, the actual probability of obtaining, in random sampling from a single population, a difference as large as that observed was only about .10; in addition, the results consistently favoured the Comprehension items, whose discrimination indices exceeded those of the Knowledge items at the extremes as well as at the mean. It was therefore suggested that if adequate testing time could be obtained, the use of larger numbers of items in all categories might increase test-reliability and possibly produce a significant result. Suggestions were advanced, based upon observations from the data, for refining the experiment and for further research. === Education, Faculty of === Graduate
author	McKie, Thomas Douglas Muir
author_facet	McKie, Thomas Douglas Muir
author_sort	McKie, Thomas Douglas Muir
title	An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_short	An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_full	An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_fullStr	An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_full_unstemmed	An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
title_sort	investigation of the relationship between the relevance category of achievement test items and their indices of discrimination
publisher	University of British Columbia
publishDate	2011
url	http://hdl.handle.net/2429/39012
work_keys_str_mv	AT mckiethomasdouglasmuir aninvestigationoftherelationshipbetweentherelevancecategoryofachievementtestitemsandtheirindicesofdiscrimination AT mckiethomasdouglasmuir investigationoftherelationshipbetweentherelevancecategoryofachievementtestitemsandtheirindicesofdiscrimination
_version_	1718596304158851072
spelling	ndltd-UBC-oai-circle.library.ubc.ca-2429-390122018-01-05T17:49:26Z An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination McKie, Thomas Douglas Muir Educational tests and measurements It was hypothesized that on an achievement test, items measuring complex cognitive objectives would exhibit a higher mean discrimination index — based on the whole test as criterion — than would an equal number of items measuring less complex cognitive objectives; and that the mean discrimination index of these items would in turn be higher than that of the same number of still less complex items. The proviso was made that the difficulty indices of the items be similarly distributed within the several categories of items, hereafter called "relevance-categories," since discriminating power is related to difficulty. The categories selected were, from simplest to most complex, the Knowledge, Comprehension, and Application categories of Bloom's "Taxonomy of Educational Objectives." An achievement test was constructed, consisting of items in all three categories, and covering the content of two units of the British Columbia university-programme grade nine science course. A try-out of this test, on 200 students in two schools, permitted negatively discriminating items to be rejected and, in addition, provided difficulty indices for the remaining items. It was possible to match forty Knowledge items and forty Comprehension items very closely for difficulty; however, the mean difficulty of the Application items was so high that they could not be used in a test of the hypothesis without reducing numbers too drastically in all categories. Two "equivalent forms," matched for content, relevance-category, and difficulty were constructed from these eighty items and administered to 530 students in three schools. The reliability coefficient of the total test, estimated by correlating the sub-test scores and applying the Spearman-Brown formula, was .84; those of the Knowledge and Comprehension categories were similarly found to be .69 and .77, respectively. Revised difficulty indices, based on the new and larger sample, were calculated. Their distribution within the two relevance categories were found to be very similar, though not as closely matched as on the basis of the try-out test. For each item, the point-biserial coefficient of correlation between item and total score was computed — this being the selected index of discrimination — and Fisher's z-transformation was applied to produce measures with more nearly an equal-unit scale, in the hope that the parametric t-test could be used. However, the shapes of the resulting distributions were such that they could not be claimed to be samples from a normal population or populations. Accordingly, the t-test was rejected in favour of the non-parametric Mann-Whitney test of "no difference in median discrimination indices." The respective medians were .27 and .30, in terms of Fisher's z-values, but the difference proved to be non-significant at the pre-selected l%-level of significance. It was concluded that this experiment provided no grounds for accepting the hypothesis of the study. However, the actual probability of obtaining, in random sampling from a single population, a difference as large as that observed was only about .10; in addition, the results consistently favoured the Comprehension items, whose discrimination indices exceeded those of the Knowledge items at the extremes as well as at the mean. It was therefore suggested that if adequate testing time could be obtained, the use of larger numbers of items in all categories might increase test-reliability and possibly produce a significant result. Suggestions were advanced, based upon observations from the data, for refining the experiment and for further research. Education, Faculty of Graduate 2011-11-15T20:54:42Z 2011-11-15T20:54:42Z 1962 Text Thesis/Dissertation http://hdl.handle.net/2429/39012 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. University of British Columbia

An investigation of the relationship between the relevance category of achievement test items and their indices of discrimination

Similar Items