Item response theory
|
Contents |
Overview
Item response theory designates a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of item response theory is to understand and improve the reliability of psychological tests.
Item response theory is very often referred to by its acronym, IRT. IRT may be regarded as roughly synonymous with latent trait theory. It is sometimes referred to using the word strong as in strong true score theory or modern as in modern mental test theory because IRT is a more recent body of theory and makes stronger assumptions as compared to classical test theory.
IRT models
Much of the literature on IRT revolves around item response models. These models relate a person parameter (or, in the case of multidimensional item response theory, a vector of person parameters) to one or more item parameters. For example:
<math> p_i({\theta})=c_i + \frac{(1-c_i)}{1+e^{-Da_i({\theta}-b_i)}} <math>
where <math>{\theta}<math> is the person parameter and <math>a_i<math>, <math>b_i<math>, and <math>c_i<math> are item parameters. The item parameter <math>a_i<math> is a measure of the discrimating power of the item, i.e., similar to the classical test index point-biserial correlation. The <math>b_i<math> parameter is the measure of the item difficulty, and for items, such as multiple choice, where there is a chance of guessing the correct answer, there is the <math>c_i<math> parameter.
This logistic model relates the level of the person parameter and item parameters to the probability of responding correctly. The constant D has the value 1.702 which rescales the logistic function to closely approximate the cumulative normal ogive. (This model was originally developed using the normal ogive but the logistic model with the recaling provides virtually the same model while simplifying the computations greatly.)
The line that traces the probability for a given item across levels of the trait is called the item characteristic curve (ICC) or, less commonly, item response function.
The person parameter indicates the individual's standing in the latent trait. The estimate of the person parameter is the individual's test score. The latent trait is the human capacity measured by the test. It might be a cognitive ability, physical ability, skill, knowledge level, attitude, personality characteristic, etc. In a unidimensional model such as the one above, this trait is considered to be a single factor (as in factor analysis). Individual items or individuals might have secondary factors but these are assumed to be mutually independent and collectively othogonal.
The item parameters simply determine the shape of the ICC and in some cases may not have a direct interpretation. In this case, however, the parameters are commonly interpreted as follows. The b parameter is considered to index an item's difficulty. Note that this model scales the items's difficulty and the person's trait onto the same metric. Thus, it is valid to talk about an item being about as hard as Person A's trait level or of a person's trait level being about the same as Item Y's difficulty. The a parameter controls how steeply the ICC rises and thus indicates the degree to which the item distinguishes individuals with trait levels above and below the rising slope of the ICC. This parameter is thus called the item discrimination and is correlated with the item's loading on the underlying factor, with the item-total correlation, and with the index of discrimination. The final parameter, c, is the asympotote of the ICC on the left-hand side. Thus it indicates the probability that very low ability individuals will get this item correct by chance.
This model assumes a single trait dimension and a binary outcome; it is a dichotomous, unidimensional model. Another class of models preduct polytomous outcomes. And a class of models exist to predict response data that arise from multiple traits.
Note to reader: Below here, this article is still very much under construction
Information
One of the major contributions of item response theory is the extension or the concept of reliability. Traditionally, reliability refers to the precision of measurement (i.e., the degree to which measurement is free of error). And traditionally, it is measured using a single index defined in various ways, such as the ratio of true and observed score variance. This index is helpful in characterizing a test's average reliability, for example in order to compare two tests. But it is clear that reliability cannot be uniform across the entire range of test scores. Scores at the edges of the test's range, for example, are known to have more error than scores closer to the middle.
Item response theory advances the concept of item and test information to replace reliability. Information is also a function of the model parameters. The item information supplied by the one parameter Rasch model is simply the probability of a correct response multiplied by the probably of an incorrect response, or,
<math>
I({\theta})=p_i({\theta})*q_i({\theta})
<math>
The Standard Error of estimation (SE)is the reciprocal of the test information of at a given trait level, is the
<math>
SE({\theta})=1/sqrt(I({\theta}))
<math>
Thus more information implies less error of measurement.
For other models, such as the two and three parameters models, the discrimination parameter plays an inportant part in the function. The item information function for the two parameter model is
<math>
I({\theta})=a_i^2*p_i({\theta})*q_i({\theta})
<math>
In general, item information functions tend to look "bell-shaped." Highly discriminating items have tall, narrow information functions; they contribute greatly but over a narrow range. Less discriminating items provide less information but over a wider range.
Plots of item information can be used to see how much information an item contributes and to what portion of the scale score range. Because of local independence, item information functions are additive. Thus, the test information function is simply the sum of the information functions of the items on the exam. Using this property with a large item bank, test information functions can be shaped to control measurement error very precisely.
Estimation
A comparison of classical and modern test theory
Scoring
After the model is fit to data, each person has a theta estimate. This estimate is their score on the exam. This "IRT score" is computed and interpreted in a very different manner as compared to traditional scores like number or percent correct. However, for most tests, the (linear) correlation between the theta estimate and a traditional score is very high (e.g., .95). A graph of IRT scores against traditional scores shows an ogive shape implying that the IRT score is somewhat better at separating individuals with low or high trait standing.
It is worth noting the implications of IRT for test-takers. Tests are imprecise tools and the score achieved by an individual (the observed score) is always the true score occluded by some degree of error. This error may push the observed score higher or lower.
Also, nothing about these models refutes human development or improvement. A person may learn skills, knowledge or even so called "test-taking skills" which may translate to a higher true-score.
See also psychometrics, standardized test, classical test theory
A brief list of references
Many books have been written that address item response theory or contain IRT or IRT-like models. This is a partial list, focusing on texts that provide more depth.
- Lord, F.M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Erlbaum.
This book summaries much of Lord's IRT work, including chapters on the relationship between IRT and clasical methods, fundamentals of IRT, estimation, and several advanced topics. Its estimation chapter is now dated in that it primarily discusses joint maximum liklihood method rather than the marginal maximum liklihood method implemented by Darrell Bock and his colleages.
- Embretson, S. and Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
This book is an accessible introduction to IRT, aimed, as the title says, at psychologists.
External links
IRT Tutorial (http://work.psych.uiuc.edu/irt/tutorial.asp)
An introduction to IRT (http://edres.org/irt/)
All about Rasch Measurement (http://www.rasch.org/)