12 of 45
Directed Questions
|
Glossary
|
Assessment
Assessment for Principles of Measurement
Multiple Choice
Attention: ONLINE RESPONDING IS DISABLED
Javascripting is either turned off or is not recognized by your browser.
1.
What does reliability measure?
A.
Appropriateness of a test
(
*
)
B.
Consistency of a test
Correct
C.
Content of a test
D.
Validity of a test
2.
Which type of reliability examines the consistency of individual scores across different administrations of a test?
A.
Internal reliability
(
*
)
B.
Test-retest reliability
Correct
C.
Parallel forms reliability
D.
Inter-rater reliability
3.
Which type of reliability evaluates if different versions of the same test have the same measurement characteristics?
A.
Internal reliability
B.
Test-retest reliability
(
*
)
C.
Parallel forms reliability
Correct
D.
Inter-rater reliability
4.
Which type of reliability indicates the degree of consistency in responses across the many items of a single test?
(
*
)
A.
Internal reliability
Correct
B.
Test-retest reliability
C.
Parallel forms reliability
D.
Inter-rater reliability
5.
Which type of reliability statistic requires the administration two or more forms of the same test?
A.
Internal reliability
B.
Inter-rater reliability
(
*
)
C.
Parallel forms reliability
Correct
D.
Test-retest reliability
6.
Which type of reliability statistic involves calculating the percent of agreement between the way two people scored the same test?
A.
Internal reliability
(
*
)
B.
Inter-rater reliability
Correct
C.
Parallel forms reliability
D.
Test-retest reliability
7.
Which type of reliability statistic involves dividing a test into two equal parts and calculating the correlation between the two halves?
(
*
)
A.
Internal reliability
Correct
B.
Inter-rater reliability
C.
Parallel forms reliability
D.
Test-retest reliability
8.
What minimum values are considered decent statistics for reliability?
A.
0.60 to 0.65
B.
0.70 to 0.75
(
*
)
C.
0.80 to 0.85
Correct
D.
0.90 to 0.95
9.
What term describes the structured framework for the process of matching test items to a chosen domain?
A.
Test blueprint
(
*
)
B.
Table of specifications
Correct
C.
AYP logic model
D.
Generalizability chart
10.
Which type of validity evidence is sometimes represented by showing a match between the items that are on a test and the items that should be on the test?
(
*
)
A.
Content-related
Correct
B.
Criterion-relatezd
C.
Construct-related
D.
Consequences
11.
What term describes the size of the relationships between scores on tests?
(
*
)
A.
Correlation coefficient
Correct
B.
AYP index
C.
Reliability
D.
Construct-related evidence
12.
Which type of validity evidence or consequences of validity is most often represented by a correlation between two tests?
A.
Content-related
(
*
)
B.
Criterion-related
Correct
C.
Construct-related
D.
Consequences
13.
Which type of validity evidence or consequences of validity is driven by the theoretical definition of the invisible trait that is meant to be measured?
A.
Content-related
B.
Criterion-related
(
*
)
C.
Construct-related
Correct
D.
Consequences
14.
Which type of validity evidence or consequences of validity is sometimes developed by looking at the impact that a test has on a person or population?
A.
Content-related
B.
Criterion-related
C.
Construct-related
(
*
)
D.
Consequences
Correct
15.
Which concept refers to any difference not attributable to actual differences in the trait or ability being measured?
A.
True variance
(
*
)
B.
Error variance
Correct
C.
Standard deviation
D.
Score variability
16.
Which is not a descriptor of bias?
A.
A systematic error
B.
A technical concept
(
*
)
C.
Identical to fairness
Correct
D.
Can be analyzed impartially
17.
Which procedure for bias detection is concerned with the opinion of individuals representing the relevant subgroups in the population of potential examinees?
A.
Empirical review
(
*
)
B.
Judgmental review
Correct
C.
Fairness review
D.
Differential item functioning analysis
18.
Which procedure for bias detection allows test developers to ascertain statistically whether or not individual items perform differently for relevant subgroups?
(
*
)
A.
Empirical review
Correct
B.
Judgmental review
C.
Fairness review
D.
Logical review
19.
A Spanish-speaking student is administered a state assessment that has been translated into Spanish. Is this an example of an accommodation for an assessment, alternate assessment, a modification for an assessment, or a universally designed assessment?
(
*
)
A.
Accommodation for an assessment
Correct
B.
Alternate assessment
C.
Modification of an assessment
D.
Universally designed assessment
20.
Which type of test attempts to measure a person´s current knowledge or skill level in a given realm?
(
*
)
A.
Aptitude Test
Correct
B.
Achievement Test
C.
Psychological Test
D.
Personality Test
21.
Which type of test attempts to measure more abstract items such as constructs?
A.
Aptitude Test
B.
Achievement Test
(
*
)
C.
Psychological Test
Correct
D.
Personality Test
22.
What is the first and most important step in developing a standardized test?
A.
Develop test specifications
B.
Define the content area
(
*
)
C.
Clearly identify its purpose
Correct
D.
Develop the test items
23.
What term represents a theoretical universe of content that represents every conceivable piece of knowledge or skill set that directly relates to the purpose of the test?
A.
Population
B.
Test specifications
C.
Test blueprint
(
*
)
D.
Domain
Correct
24.
Bloom´ taxonomy of cognitive operations consists of six levels of cognitive processing, which is not one of those levels?
A.
Knowledge
B.
Comprehension
C.
Application
(
*
)
D.
Deconstructing
Correct
25.
What is not included in test specifications?
(
*
)
A.
The test items
Correct
B.
The type of item format
C.
The scoring rules
D.
The method of score interpretation
26.
The quality of an item is usually measured on three dimensions, which is not one of those dimensions?
A.
Difficulty level
B.
Discrimination index
(
*
)
C.
Criterion validity
Correct
D.
Differential item functioning
27.
What does a difficulty level of .6 mean?
A.
60% of the test takers answered the item incorrectly.
(
*
)
B.
60% of the test takers answered the item correctly.
Correct
C.
The majority of test takers had a median score of 60% on that item.
D.
60% of the test is more difficult than this item.
28.
The discrimination index:
(
*
)
A.
Calculates the tendency of high-performing students to answer an item correctly and low-performing students to answer it incorrectly.
Correct
B.
Calculates the proportion of students who answer the item correctly.
C.
Is used to help determine if any of the items are biased for or against a particular group.
D.
Is used to determine an items reliability.
29.
What kinds of scores are translated to a different scale through the process of equating?
A.
Scaled
B.
Norms
C.
Percentile ranks
(
*
)
D.
Raw
Correct
30.
Which type of equating takes into account the percentile ranks of scores on multiple versions of a test, and relates them accordingly.
A.
Linear equating
B.
Norm referenced equating
(
*
)
C.
Equipercentile equating
Correct
D.
Vertical equating
31.
Which equating process compares performance on a single test across grades or age ranges?
A.
Linear equating
B.
Norm Referenced equating
C.
Equipercentile equating
(
*
)
D.
Vertical equating
Correct
32.
Equating helps track student growth trends
(
*
)
A.
Over time and across age groups
Correct
B.
Compared to teacher effectiveness
C.
When one test version is used
D.
Compared to attrition rates
33.
The Differential item functioning (DIF):
A.
Calculates the tendency of high-performing students to answer an item correctly and low-performing students to answer it incorrectly.
B.
Calculates the tendency of high-performing students to answer an item correctly and low-performing students to answer it incorrectly.
(
*
)
C.
Is used to help determine if any of the items are biased for or against a particular group.
Correct
D.
s used to determine an items reliability.
Principles of Measurement