|
|||||||
| Lesson 3: Outline | Notes | Glossary | Presentation | Activities | Directed Questions | Assessment | |
![]() |
Presentation: Fairness |
![]() |
|||||
|
Sources of Variance Variance is a fancy statistical term for how different a set of numbers are from each other. It is defined based on how close each number is to the average of the set of numbers. If all the numbers are the same they are all the same as the average and their variance is 0. If all the numbers are at the extremes (say 10 scores of 0 and 10 scores of 100), then all the scores are very different than the average (which is 50 in this case) and the variance is high. | ||
|
Test Bias As we said before, there are lots of different reasons why scores might vary. Perhaps the test scores were for a math test for 5th graders, but all the problems were word problems and were written at the 8th grade reading level. Maybe the 10 students who got 0s got that score not because they did not know math, but because they could not read. Maybe they also would have gotten scores of 100 if they were better readers. Calling something a math test does not mean that it is measuring only math. Maybe this math test is biased against poor readers. Test bias can hurt students (and teachers) in many ways. One obvious way is that it can lead teachers to make bad instructional decisions for students. A low math score might lead to a student being placed in a remedial math course, but what they need might be additional reading instruction. Because bias is systematic, it affects not just one child, but groups of children. If there is a correlation between an irrelevant Construct that a test is measuring and membership in a racial, ethnic, or gender group, then the bias can negatively impact whole groups of children. Thirty or more years ago far more males than females were interested in and knowledgeable about sports. Many verbal reasoning tests use word analogy problems. Some of those problems used sports terms. Research showed some of those items were biased against female students - those items underestimated the verbal reasoning ability of females. | ||
|
Test Fairness and Test Bias The issue of fairness has become one of growing public concern since the Civil Rights Act of 1964. A natural extension of that concern has been the attention to evaluating bias within tests used for selection, placement, and classification. Bias in an examination can be any irrelevant influence causing differences in examinee scores, as opposed to differences resulting from true variation in ability. In other words, test bias results in differential Validity for different groups. Bias is another example of a systematic error and is a technical concept in that it can be analyzed impartially through statistics. Because bias is closely linked to principles of score validity, quality test construction procedures require that bias be addressed throughout the test design, construction, and implementation stages. Fairness, although usually associated with bias, is not the same thing. Fairness is not a technical concept but a broad concept that is based on philosophies of test use, social, and personal values. Fairness is a particularly controversial issue in our society today as tests (e.g. intelligence tests, SAT) are often used in selection processes and in conferring privileges. Theoretically, it is possible that both a biased test and a non-biased test can be used fairly or unfairly. | ||
| Another procedure often used for bias detection is empirical review. Empirical reviews are conducted following test administration. This review allows test developers to ascertain statistically whether or not individual items perform differently for relevant subgroups and is known as Differential Item Functioning (DIF) analysis. Procedurally, DIF analyses involve matching two groups such as males and females or Caucasian and Asian on the criterion of interest, usually the total test score and looking for group differences over and above ability. DIF is present when examinees in the two separate groups have the same ability or total test score, but have a different probability of correctly responding to a particular test question. A variety of statistical techniques are available for DIF analyses. In any DIF analysis, there are two distinct types of DIF that can be identified, both uniform and non-uniform. Uniform DIF is present when the probability of answering an item correctly is consistently or "uniformly" higher for one group over all levels of ability. Non-uniform DIF is present when the probability of answering an item correctly is inconsistent or "non-uniform" over all levels of ability. Positive identification of DIF is not proof of bias, but indicates that an item may be unfair to a particular subgroup. Upon identification, expert judges representing the group of interest should conduct a logical review of items exhibiting DIF and either revise, remove, or approve the items in question. | ||
|
The No Child Left Behind Act requires all students to be tested, including those with moderate to severe learning disabilities. The section discusses the idea behind a universally designed assessment, as well as Accommodations, Modifications, and Alternate Assessments. Universal design A universally designed assessment is one that is accessible to students who might have any of a variety of common disabilities. That is, most people can accurately demonstrate their level of knowledge on such a test without needing a special form of the assessment or administration conditions that are decided upon by a third party. For example, a universally designed computerized assessment might have a feature that allows all examinees to select the text size with which they are most comfortable. The key to a universally designed test is that a separate test or special accommodations are not necessary on an individual basis because the test automatically provides appropriate administration choices for each examinee. | ||
|
Accommodations Accommodations are adjustments to a test that are intended to not affect the validity of the test, but make it accessible to students with disabilities. On a reading test, for example, providing the text in a larger font size is an accommodation that does not affect the validity of the test; it simply makes the test accessible to a visually impaired student. In general a testing accommodation is any change to the testing conditions that reduces the impact of Construct irrelevant factors (in the example, factors other than knowledge of mathematics) for an identifiable subgroup of examinees and has no significant impact on the scores of other examinees. The most common, and in some ways most controversial, accommodation is testing time. Some examinees, such as those with attention deficit disorder or dyslexia, require more time to process written information than do examinees without these conditions. However, research has shown that many other examinees, not all of whom are members of readily identifiable subgroups, benefit from additional testing time. Other common accommodation approaches include translating test questions into other spoken or sign languages, Braille, and audio test directions. | ||
|
Modifications Modifications are adjustments to a test that are likely to affect the constructs measured by a test. Tests with modifications are changed more dramatically than a test with an accommodation, as described above. Modified tests may be altered in terms of length or, in the case of a multiple-choice test, the number of options. The overall Difficulty of the test may also be modified from the original version. | ||
| Principles of Measurement |
|
||||||