If a ruler was made of sticky stretchy rubber it would not be very useful. Accurate measurements would be difficult. Something that measured two inches one time might measure three inches another, simply because the ruler stuck to the material being measured and was relatively stretched or compressed one time compared to the other. Consistency of results is a key part of the definition of good measurement. Tests can be like that rubber ruler. Some tests give results that are more consistent than others. This basic property of good tests is called reliability and impacts the reasonable inferences one can make from test scores.
There are other technical considerations that differentiate good tests from bad ones. Validity is the extent to which test scores really indicate the qualities that are attributed to them. Does this math test measure knowledge of mathematics or do scores, in part, reflect the language proficiency of the students taking it? Test fairness is an aspect of validity. If tests do not measure knowledge or skill equitably for different groups of students, then they are not measuring the Construct of interest well.
Technical quality needs to be built into tests through a rigorous test development process. These technical qualities and how one can build them into tests is the focus of this module.
|