What is the term for measurement test or study that actually measures what it intends to measure?
For every dimension of interest and specific question or set of questions, there are a vast number of ways to make questions. Although the guiding principle should be the specific purposes of the research, there are better and worse questions for any particular operationalization. How to evaluate the measures? Show
Two of the primary criteria of evaluation in any measurement or observation are:
These two concepts are validity and reliability. Reliability is concerned with questions of stability and consistency - does the same measurement tool yield stable and consistent results when repeated over time. Think about measurement processes in other contexts - in construction or woodworking, a tape measure is a highly reliable measuring instrument. Say you have a piece of wood that is 2 1/2 feet long. You
measure it once with the tape Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). To continue with the example of measuring the piece of wood, a tape measure that has been created with accurate spacing for inches, feet, etc. should yield valid results as well. Measuring this piece of wood with a "good" tape measure should produce a correct measurement of the wood's length. To apply these concepts to social research, we want to use measurement tools that are both reliable and valid. We want questions that yield consistent responses when asked multiple times - this is reliability. Similarly, we want questions that get accurate responses from respondents - this is validity. ReliabilityReliability refers to a condition where a measurement process yields consistent scores (given an unchanged measured phenomenon) over repeat measurements. Perhaps the most straightforward way to assess reliability is to ensure that they meet the following three criteria of reliability. Measures that are high in reliability should exhibit all three. Test-Retest ReliabilityWhen a researcher administers the same measurement tool multiple times - asks the same question, follows the same research procedures, etc. - does he/she obtain consistent results, assuming that there has been no change in whatever he/she is measuring? This is really the simplest method for assessing reliability - when a researcher asks the same person the same question twice ("What's your name?"), does he/she get back the same results both times. If so, the measure has test-retest reliability. Measurement of the piece of wood talked about earlier has high test-retest reliability. Inter-Item ReliabilityThis is a dimension that applies to cases where multiple items are used to measure a Interobserver ReliabilityInterobserver reliability concerns the extent to which different interviewers or observers using the same measure get equivalent results. If different observers or interviewers use the same instrument to score the same thing, their scores should match. For example, the interobserver reliability of an observational assessment of parent-child interaction is often evaluated by showing two observers a videotape of a parent and child at play. These observers are asked to use an assessment tool to score the interactions between parent and child on the tape. If the instrument has high interobserver reliability, the scores of the two observers should match. ValidityTo reiterate, validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). How to assess the validity of a set of measurements? A valid measure should satisfy four criteria. Face ValidityThis criterion is an assessment of whether a measure appears, on the face of it, to measure the concept it is intended to measure. This is a very minimum assessment - if a measure cannot satisfy this criterion, then the other criteria are inconsequential. We can think about observational measures of behavior that would have face validity. For example, striking out at another person would have face validity for an indicator of aggression. Similarly, offering assistance to a stranger would meet the criterion of face validity for helping. However, asking people about their favorite movie to measure racial prejudice has little face validity. Content ValidityContent validity concerns the extent to which a
measure adequately represents all facets of a concept. Consider a series of questions that serve as indicators of depression (don't feel like eating, lost interest in things usually enjoyed, etc.). If there were other kinds of common behaviors that mark a person as depressed that were not included in the index, then the index would have low content validity since it did not adequately represent Criterion-Related ValidityCriterion-related validity applies to instruments than have been developed for usefulness as indicator of specific trait or behavior, either now or in the future. For example, think about the driving test as a social measurement that has pretty good predictive validity. That is to say, an individual's performance on a driving test correlates well with his/her driving ability. Construct ValidityBut for a many things we want to measure, there is not necessarily a pertinent criterion available. In this case, turn to construct validity, which concerns the extent to which a measure is related to other measures as specified by theory or previous research. Does a measure stack up with other variables the way we expect it to? A good example of this form of validity comes from early self-esteem studies - self-esteem refers to a person's sense of self-worth or self-respect. Clinical observations in psychology had shown that people who had low self-esteem often had depression. Therefore, to establish the construct validity of the self-esteem measure, the researchers showed that those with higher scores on the self-esteem measure had lower depression scores, while those with low self-esteem had higher rates of depression. Validity and Reliability ComparedSo what is the relationship between validity and reliability? The two do not necessarily go hand-in-hand.
It is possible to have a measure that has high reliability but low validity - one that is consistent in getting bad information or consistent in missing the mark. *It is also possible to have one that has low reliability and low validity - inconsistent and not on target. Finally, it is not possible to have a measure that has low reliability and high validity - you can't really get at what you want or what you're interested in if your measure fluctuates wildly. What is the term for a test that accurately measures what it intends to measure?Hence, it could be concluded that a test that measures what it intends to measure is called validity.
What is a research instrument that measures what it is supposed to measure called?Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees.
What is a measurement procedure called if it measures or predicts what it is intended to measure or predict?Predictive Validity: if the test accurately predicts what it is supposed to predict. For example, the SAT exhibits predictive validity for performance in college. It can also refer to when scores from the predictor measure are taken first and then the criterion data is collected later.
|