|classical test theory (CTT)||item response theory (IRT)?|
|Test Length and Reliability||Longer tests are more reliable than shorter tests.||Shorter tests can be more reliable than longer tests.|
|Metric||people are measured on the number correct scale; items are measured on the proportion-correct scale. Score meaning is determined primarily as a location in a norm group.||people and items are placed on the same scale, making it possible to scale persons relative to items and vice versa.|
|Item Parameters||difficulty and discrimination– they dependent upon the group in which they are estimated and unbiased estimate of these parameters require a representative sample.||difficulty and discrimination – estimated in one sample from a population are linearly transformable to estimates of those parameters on another sample from the same population. Unbiased estimates do not require a representative sample.|
|Standard Error of Measurement (SEM)||SEM is group dependent (i.e., based on the group standard deviation) and is constant for a group, regardless of score level.||SEM is independent of the group on the measurements are taken; IRT permits calculation of a SEM for a single individual based on their performance and the item parameters. It also provides conditional SEMs that allow the SEM to vary at the different levels of latent continuum.|
|Trait Level Estimation (Scoring)||The number-correct score is dependent upon the number of the items in the test, the difficulty level of the items, limited to a fixed number of discrete values, and interpretable on a within-group normative basis||Maximum likelihood scoring of IRT-based tests yields a trait level estimate for an individual that is independent of (1) the number of items, (2) the difficulty level of the items, (3) the group of individuals in which the person was measured, and (4) is reported on a real-number scale|
|Item – Model Fit||There are no explicit procedures for determining whether test items are functioning according to the underlying model||Has explicit procedures for determining item fit that allow the identification of items that do not meet the assumptions of the model|
|Person Model Fit||Although there are procedures for determining person fit to the CTT model, they require normative comparisons.||procedures for determining person fit allow for the identification of persons who are not well measured by the model, independently of other persons measured at the same time|
|Equating||Score equating requires complicated procedures that require assumptions about population score distributions.||Because persons and items are on the same scale, equating occurs automatically as a result of linking, without assumptions of score distributions. This makes it possible to compare on a common scale person measured in different groups and with different items.|
Hambleton, R.K., Swaminathan, H., Rogers, H.J. (1991). Fundamentals of Item Response Theory. SAGE.
De Ayala, R.J. (2009). Theory and practice of item response theory. Guilford Press.
Demars, C. (2010). Item Response Theory. Oxford University Press.
Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum