10 of 45
Directed Questions
|
Glossary
|
Assessment
Directed Questions for Principles of Measurement
Multiple Choice
Attention: ONLINE RESPONDING IS DISABLED
Javascripting is either turned off or is not recognized by your browser.
1.
Mrs. Jackson gave two versions of her Algebra test to her class. She then correlated the results of the two assessments. The value was 0.60. What type of reliability did Mrs. Jackson compute? How would you interpret the value of her reliability statistic?
example:
She used parallel test forms reliability.
A value of 0.60 means that the scores on the two tests were not very reliable.
2.
Mr. Conway, a physical education teacher, timed his students during a 1-mile run. One week later, he timed his students again. What type of reliability is Mr. Conway measuring? What are some of the factors that may add random error into his statistic?
example:
Test-retest reliability.
Factors that may add error into his statistic include whether or not his students exercised more during that one-week period, whether or not they ate breakfast both mornings of the run, whether or not they were healthy both days, etc...
3.
Ms. Sanchez administered a 20-word spelling test to her second grade class. Each word was worth 1 point. What are some ways Ms. Sanchez could divide her test to calculate the internal reliability using the split-half reliability statistic?
example:
Divide the test by even and odd number questions, use the first half and the second half of the test, randomly select 10 spelling words for one group and use the other 10 for the other group, etc...
4.
What are the attributes of a test that produces valid scores?
example:
The test covers relevant content.
The test relates to external criteria, such as a blueprint or table or specifications.
The test measures what it intended to measure.
5.
Why is test validation a continual concern?
example:
High stakes decisions may be made based on test results and the content, construct, or criteria for the items may change.
There is always more evidence that can be gathered
6.
What are the three traditional sources of evidence for the validity of a test?
example:
Content-related evidence, criterion-related evidence, construct-related evidence.
7.
Provide 3 examples of unintended, negative effects of testing, interpretation, and subsequent action taken on the population being tested.
example:
A teacher gives a test to a group of students to separate them into beginning, average, and advanced groups. The teacher retests them a year later. Due to low self-esteem, the beginning group shows little progress.
A teacher gives a test to a homogeneous group of students and then tries to generalize to a heterogeneous population.
A teacher teaches directly to the test and not the designed curriculum.
8.
What is variance and how does it affect our understanding of reliability?
example:
Variance refers to how much a score deviates form the mean score. Error variance can affect reliability by introducing error into the score that cannot be attributed to actual differences in the trait or ability being measured.
9.
What is the difference between true variance and error variance? Provide an example of each type.
example:
True variance is the actual difference in the trait or ability being measured. Error variance is any difference attributable to unintended factors. True variance would be the difference evidenced between a high ability and low ability student. Error variance would be outside noise going on at the same time student are trying to take the test.
10.
How does systematic error differ from random error?
example:
Systematic error is error that occurs in a predictable fashion for every test taker. It is a validity problem. Random error occurs by chance and is not consistent or predictable. It is a reliability problem.
11.
How can the standard error of measurement be used to increase fairness in score reporting?
example:
Use a range for reporting scores instead of specific scores.
12.
What is the difference between test bias and test fairness?
example:
Test bias is a technical and systematic error that may or may not affect test fairness. Test fairness is a broad concept based on philosophies of test use, social, and personal values.
13.
How may test bias be detected?
example:
A variety of statistical techniques, including DIF.
14.
What is an example of a test accommodation?
example:
Braille
15.
What type of student may qualify to take an alternate assessment?
example:
Students with disabilities so severe the general assessment would not produce meaningful interpretable scores.
16.
What is a construct and how can we test for it?
example:
A construct is a theorized phenomenon that cannot be directly observed or measured. To measure it, it must be operationally defined into observable and measurable behaviors.
17.
When determining the purpose of your exam, what are some questions you should ask yourself regarding the exam?
example:
Who is my intended audience? What level of difficulty should the items be to achieve my purpose? What do I want to learn from the exam? Etc...
18.
What is domain sampling and why is it necessary?
example:
Domain sampling is the process of selecting a representative sample set of items from a test´s content domain. This is necessary whenever the domain is too large to be tested completely in a reasonable period of time.
19.
What information should be included in test specifications?
example:
How a test´s items should be constructed, from what content areas, and in what proportions
20.
What is an experimental section and why do tests frequently include an experimental section?
example:
An experimental section of a test is a part of the test that contains new items being field tested for future versions of the test. These are frequently included because it is an efficient way to field test new items.
21.
What are difficulty level, discrimination index, and differential item functioning and how are they used to determine the quality of an item?
example:
Difficulty level, discrimination index, and differential item functioning are all characteristics of an item.
Difficulty level measures how difficult it is for an examinee to correctly respond to an item. The discrimination index refers to how well the item discriminates between low and high ability students. Differential item functioning is used to detect item bias.
22.
What is equating and why is it necessary?
example:
Equating is the process by which raw scores from different tests or different versions of the same test are translated to a new scale.
Equating is necessary because it is common for one version of a test to be slightly easier or more difficult than another version.
23.
What is vertical equating?
example:
Vertical equating allows you to compare performance on a single test across grades or age ranges.
24.
What is horizontal equating?
example:
Horizontal equating allows you to compare performance over time of students in the same grade.
Principles of Measurement