- Reliability refers to the accuracy or precision of a measurement process, or in another word, consistency over replications of the testing procedure (Crocker & Algina, 1986)
What makes test scores unreliable?
- Systematic errors are the errors that consistently affect an individual’s score because of some particular characteristic of a person, or the test does not measure the intended construct (Crocker & Algina, 1986).
- Random errors are the kind of errors that affect an individual’s score because of purely chance happenings such as guessing, distractions in the testing situation, administration errors, content sampling, scoring errors, and fluctuations in the individual examinee’s state (Crocker & Algina, 1986).
Both of them affect the score interpretation. Systematic measurement errors do not result in inconsistent measurement, however, still they may cause test scores to be inacrate and thus reduce their practical utility. Random errors reduce both the consistency and the usefulness of the test scores (Crocker & Algina, 1986).
- Validity is the correctness of the interpretation or conclusion achieved from the application of a measure (DeVellis, 2012). For instance, if an instrument is measuring depression, is it really measuring depression or is it measuring anxiety?
According to Shadish, Cook, & Campbell (2002) they divided validity to four types:
- Internal validity: inferences about whether observed relationships among variables that were manipulated or measured within a study.
- External validity: inferences about whether observed relationships among manipulated or measured variables extend outside the study.
- Construct validity: degree to which scores on a measure actually assess the construct of interest and whether inferences from “sampling particulars” represent their “higher–order” constructs.
- Statistical conclusion validity: statistical inferences that affect causal relationships among variables:
- Whether there is a relationship – related to the null hypothesis significance testing and the interpretation of p values.
- The strength of the relationship – related to the effect size determination.
Useful resources:
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychol Rev, 111(4), 1061-1071.
References
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Harcourt Brace.
DeVellis, R. F. (2012). Scale Development Theory and Applications (Vol. 26). 3rd Ed. Sage publications.
Shadish, W., Cook, T. & Campbell, D. (2002). Experimental & Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin.