Guided Reading Activity 13-1 Characteristics of Psychological Tests Answer Key

Psychological cess contributes important information to the understanding of individual characteristics and capabilities, through the collection, integration, and interpretation of information about an individual (Groth-Marnat, 2009; Weiner, 2003). Such information is obtained through a variety of methods and measures, with relevant sources adamant by the specific purposes of the evaluation. Sources of information may include

  • Records (e.k., medical, educational, occupational, legal) obtained from the referral source;

  • Records obtained from other organizations and agencies that have been identified every bit potentially relevant;

  • Interviews conducted with the person being examined;

  • Behavioral observations;

  • Interviews with corroborative sources such as family members, friends, teachers, and others; and

  • Formal psychological or neuropsychological testing.

Agreements across multiple measures and sources, as well every bit discrepant information, enable the creation of a more comprehensive understanding of the private beingness assessed, ultimately leading to more than accurate and advisable clinical conclusions (e.thousand., diagnosis, recommendations for handling planning).

The clinical interview remains the foundation of many psychological and neuropsychological assessments. Interviewing may be structured, semistructured, or open in nature, but the goal of the interview remains consistent—to identify the nature of the customer's presenting issues, to obtain direct historical information from the examinee regarding such concerns, and to explore historical variables that may be related to the complaints being presented. In addition, the interview element of the assessment process allows for behavioral observations that may be useful in describing the client, as well as discerning the convergence with known diagnoses. Based on the information and observations gained in the interview, assessment instruments may exist selected, corroborative informants identified, and other historical records recognized that may aid the clinician in reaching a diagnosis. Conceptually, clinical interviewing explores the presenting complaint(s) (i.e., referral question), informs the understanding of the case history, aids in the development of hypotheses to be examined in the assessment process, and assists in determination of methods to address the hypotheses through formal testing.

An of import piece of the cess procedure and the focus of this study, psychological testing consists of the administration of one or more than standardized procedures under particular ecology conditions (eastward.1000., serenity, good lighting) in guild to obtain a representative sample of behavior. Such formal psychological testing may involve the administration of standardized interviews, questionnaires, surveys, and/or tests, selected with regard to the specific examinee and his or her circumstances, that offer information to respond to an cess question. Assessments, then, serve to answer to questions through the use of tests and other procedures. It is important to note that the selection of advisable tests requires an understanding of the specific circumstances of the individual being assessed, falling under the purview of clinical judgment. For this reason, the committee refrains from recommending the use of any specific test in this report. Any reference to a specific test is to provide an illustrative example, and should non be interpreted as an endorsement by the committee for apply in any specific situation; such a conclusion is best left to a qualified assessor familiar with the specific circumstances surrounding the cess.

To respond to questions regarding the apply of psychological tests for the assessment of the presence and severity of disability due to mental disorders, this chapter provides an introductory review of psychological testing. The chapter is divided into three sections: (i) types of psychological tests, (ii) psychometric properties of tests, and (3) test user qualifications and administration of tests. Where possible an try has been fabricated to address the context of disability determination; nonetheless, the affiliate is primarily an introduction to psychological testing.

TYPES OF PSYCHOLOGICAL TESTS

There are many facets to the categorization of psychological tests, and even more than if one includes educationally oriented tests; indeed, it is ofttimes difficult to differentiate many kinds of tests as purely psychological tests every bit opposed to educational tests. The ensuing word lays out some of the distinctions among such tests; nonetheless, it is of import to note that there is no 1 correct cataloging of the types of tests because the unlike categorizations often overlap. Psychological tests can be categorized by the very nature of the behavior they assess (what they mensurate), their administration, their scoring, and how they are used. Effigy 3-1 illustrates the types of psychological measures equally described in this written report.

FIGURE 3-1. Components of psychological assessment.

FIGURE iii-1

Components of psychological assessment. Notation: Performance validity tests do non mensurate cognition, but are used in conjunction with performance-based cognitive tests to examine whether the examinee is exerting sufficient effort to perform well and responding (more than...)

The Nature of Psychological Measures

One of the almost common distinctions fabricated amongst tests relates to whether they are measures of typical beliefs (often non-cognitive measures) versus tests of maximal performance (often cognitive tests) (Cronbach, 1949, 1960). A measure of typical behavior asks those completing the instrument to describe what they would commonly practise in a given state of affairs. Measures of typical behavior, such equally personality, interests, values, and attitudes, may be referred to equally non-cerebral measures. A examination of maximal performance, plainly enough, asks people to answer questions and solve issues also every bit they possibly tin can. Because tests of maximal performance typically involve cognitive performance, they are often referred to as cerebral tests. Most intelligence and other power tests would be considered cerebral tests; they can also be known every bit power tests, simply this would be a more than limited category. Non-cerebral measures rarely accept correct answers per se, although in some cases (e.thousand., employment tests) there may be preferred responses; cerebral tests almost always take items that take correct answers. It is through these two lenses—non-cognitive measures and cerebral tests—that the committee examines psychological testing for the purpose of disability evaluation in this study.

One stardom amongst non-cognitive measures is whether the stimuli composing the measure are structured or unstructured. A structured personality measure out, for example, may ask people true-or-fake questions nearly whether they engage in various activities or not. Those are highly structured questions. On the other paw, in administering some commonly used personality measures, the examiner provides an unstructured projective stimulus such as an inkblot or a picture. The test-taker is requested to draw what they run across or imagine the inkblot or picture to be describing. The premise of these projective measures is that when presented with ambiguous stimuli an individual will projection his or her underlying and unconscious motivations and attitudes. The scoring of these latter measures is oftentimes more than complex than it is for structured measures.

There is great variety in cerebral tests and what they measure, thus requiring a lengthier explanation. Cerebral tests are ofttimes separated into tests of ability and tests of achievement; nevertheless, this distinction is non as clear-cut as some would portray information technology. Both types of tests involve learning. Both kinds of tests involve what the test-taker has learned and can do. However, achievement tests typically involve learning from very specialized education and training experiences; whereas, most ability tests appraise learning that has occurred in one'due south environment. Some aspects of learning are clearly both; for instance, vocabulary is learned at home, in one'south social environment, and in school. Notably, the best predictor of intelligence test performance is one'south vocabulary, which is why it is oft given as the first test during intelligence testing or in some cases represents the trunk of the intelligence test (e.g., the Peabody Picture Vocabulary Test). Conversely, i can likewise accept a vocabulary test based on words 1 learns merely in an academic setting. Intelligence tests are so prevalent in many clinical psychology and neuropsychology situations that we besides consider them as neuropsychological measures. Some abilities are measured using subtests from intelligence tests; for case, certain working retentiveness tests would be a common example of an intelligence subtest that is used singly likewise. In that location are also standalone tests of many kinds of specialized abilities.

Some ability tests are broken into exact and performance tests. Exact tests, evidently enough, employ linguistic communication to ask questions and demonstrate answers. Functioning tests on the other mitt minimize the use of linguistic communication; they can involve solving problems that exercise not involve language. They may involve manipulating objects, tracing mazes, placing pictures in the proper order, and finishing patterns, for example. This distinction is most commonly used in the instance of intelligence tests, just can be used in other ability tests besides. Operation tests are also sometimes used when the test-taker lacks competence in the language of the testing. Many of these tests assess visual spatial tasks. Historically, nonverbal measures were given as intelligence tests for non-English language speaking soldiers in the United States as early on every bit World War I. These tests continue to exist used in educational and clinical settings given their reduced language component.

Different cognitive tests are also considered to be speeded tests versus power tests. A truly speeded test is i that everyone could go every question correct if they had enough time. Some tests of clerical skills are exactly like this; they may have ii lists of paired numbers, for example, where some pairings contain ii identical numbers and other pairings are unlike. The test-taker but circles the pairings that are identical. Pure ability tests are measures in which the only factor influencing operation is how much the test-taker knows or tin practice. A true ability test is ane where all test-takers have enough time to do their best; the simply question is what they can exercise. Evidently, few tests are either purely speeded or purely power tests. Near accept some combination of both. For instance, a testing visitor may use a dominion of pollex that 90 percent of exam-takers should consummate xc percent of the questions; however, information technology should also exist articulate that the purpose of the testing affects rules of thumb such equally this. Few teachers would wish to have many students unable to consummate the tests that they take in classes, for instance. When test-takers accept disabilities that bear on their ability to respond to questions quickly, some measures provide extra fourth dimension, depending upon their purpose and the nature of the characteristics being assessed.

Questions on both achievement and ability tests can involve either recognition or free-response in answering. In educational and intelligence tests, recognition tests typically include multiple-choice questions where one can look for the correct answer amid the options, recognize it every bit correct, and select information technology equally the correct answer. A free-response is analogous to a "backup-the-blanks" or an essay question. One must recall or solve the question without choosing from among alternative responses. This distinction besides holds for some not-cognitive tests, but the latter stardom is discussed later in this section because it focuses not on recognition simply selections. For example, a recognition question on a non-cognitive examination might inquire someone whether they would rather go ice skating or to a motion picture; a free remember question would ask the respondent what they like to do for enjoyment.

Cognitive tests of diverse types tin can exist considered every bit process or product tests. Accept, for example, mathematics tests in schoolhouse. In some instances, just getting the correct answer leads to a correct response. In other cases, teachers may requite partial credit when a pupil performs the proper operations but does not go the correct respond. Similarly, psychologists and clinical neuropsychologists often observe not only whether a person solves problems correctly (i.e., product), merely how the client goes about attempting to solve the problem (i.e., process).

Examination Administration

One of the about important distinctions relates to whether tests are group administered or are individually administered by a psychologist, dr., or technician. Tests that traditionally were group administered were paper-and-pencil measures. Oftentimes for these measures, the test-taker received both a test booklet and an reply sheet and was required, unless he or she had certain disabilities, to mark his or her responses on the respond sheet. In recent decades, some tests are administered using technology (i.e., computers and other electronic media). In that location may be some adaptive qualities to tests administered past calculator, although not all computer-administered tests are adaptive (technology-administered tests are further discussed below). An individually administered measure is typically provided to the test-taker by a psychologist, physician, or technician. More than organized religion is often provided to the individually administered measure, considering the trained professional administering the examination tin can brand judgments during the testing that affect the administration, scoring, and other observations related to the examination.

Tests can be administered in an adaptive or linear manner, whether by computer or individual ambassador. A linear test is one in which questions are administered one later on some other in a pre-bundled lodge. An adaptive exam is one in which the test-taker'southward performance on before items affects the questions he or she received subsequently. Typically, if the test-taker is answering the kickoff questions correctly or in accordance with preset or expected response algorithms, for instance, the next questions are notwithstanding more difficult until the level appropriate for the examinee performance is best reached or the exam is completed. If 1 does not answer the get-go questions correctly or as typically expected in the instance of a non-cognitive mensurate, and then easier questions would more often than not exist presented to the exam-taker.

Tests can be administered in written (keyboard or paper-and-pencil) style, orally, using an assistive device (most typically for individuals with motor disabilities), or in performance format, as previously noted. Information technology is generally difficult to administer oral or performance tests in a group situation; however, some electronic media are making it possible to administer such tests without human examiners.

Some other distinction amid measures relates to who the respondent is. In most cases, the test-taker him- or herself is the respondent to whatsoever questions posed past the psychologist or physician. In the case of a young child, many individuals with autism, or an individual, for example, who has lost language power, the examiner may need to enquire others who know the individual (parents, teachers, spouses, family members) how they carry and to describe their personality, typical behaviors, then on.

Scoring Differences

Tests are categorized as objectively scored, subjectively scored, or in some instances, both. An objectively scored instrument is one where the correct answers are counted and they either are, or they are converted to, the terminal scoring. Such tests may exist scored manually or using optical scanning machines, computerized software, software used by other electronic media, or even templates (keys) that are placed over reply sheets where a person counts the number of right answers. Examiner ratings and self-report interpretations are determined past the professional person using a rubric or scoring system to convert the examinee'south responses to a score, whether numerical or non. Sometimes subjective scores may include both quantitative and qualitative summaries or narrative descriptions of the operation of an private.

Scores on tests are often considered to be norm-referenced (or normative) or benchmark-referenced. Norm-referenced cognitive measures (such every bit college and graduate school admissions measures) inform the test-takers where they stand relative to others in the distribution. For instance, an applicant to a college may larn that she is at the 60th percentile, meaning that she has scored better than 60 percent of those taking the test and less well than 40 percent of the same norm group. Likewise, nigh if not all intelligence tests are norm-referenced, and most other power tests are as well. In recent years there has been more of a call for criterion-referenced tests, peculiarly in educational activity (Hambleton and Pitoniak, 2006). For criterion-referenced tests, one's score is not compared to the other members of the exam-taking population but rather to a stock-still standard. High school graduation tests, licensure tests, and other tests that decide whether test-takers have met minimal competency requirements are examples of criterion-referenced measures. When i takes a driving test to earn i's commuter's license, for case, one does non find out where one's driving falls in the distribution of national or statewide drivers, 1 only passes or fails.

Test Content

Every bit noted previously, the most important stardom among most psychological tests is whether they are assessing cognitive versus non-cognitive qualities. In clinical psychological and neuropsychological settings such as are the concern of this book, the virtually common cognitive tests are intelligence tests, other clinical neuropsychological measures, and functioning validity measures. Many tests used by clinical neuropsychologists, psychiatrists, technicians, or others appraise specific types of functioning, such equally memory or problem solving. Performance validity measures are typically short assessments and are sometimes interspersed among components of other assessments that help the psychologist determine whether the examinee is exerting sufficient effort to perform well and responding to the best of his or her power. Most common non-cerebral measures in clinical psychology and neuropsychology settings are personality measures and symptom validity measures. Some personality tests, such as the Minnesota Multiphasic Personality Inventory (MMPI), assess the degree to which someone expresses behaviors that are seen as atypical in relation to the norming sample. 1 Other personality tests are more normative and try to provide information most the client to the therapist. Symptom validity measures are scales, like performance validity measures, that may be interspersed throughout a longer assessment to examine whether a person is portraying him- or herself in an honest and truthful fashion. Somewhere between these two types of tests—cognitive and non-cerebral—are diverse measures of adaptive functioning that often include both cognitive and non-cognitive components.

PSYCHOMETRICS: EXAMINING THE Backdrop OF Exam SCORES

Psychometrics is the scientific written report—including the evolution, interpretation, and evaluation—of psychological tests and measures used to assess variability in beliefs and link such variability to psychological phenomena. In evaluating the quality of psychological measures we are traditionally concerned primarily with test reliability (i.e., consistency), validity (i.e., accuracy of interpretations and use), and fairness (i.east., equivalence of usage across groups). This department provides a general overview of these concepts to help orient the reader for the ensuing discussions in Chapters 4 and five. In addition, given the implications of applying psychological measures with subjects from diverse racial and ethnic backgrounds, issues of equivalence and fairness in psychological testing are also presented.

Reliability

Reliability refers to the degree to which scores from a test are stable and results are consistent. When constructs are non reliably measured the obtained scores volition not approximate a true value in relation to the psychological variable being measured. It is important to understand that observed or obtained test scores are considered to be composed of true and mistake elements. A standard error of measurement is often presented to describe, within a level of conviction (due east.g., 95 percent), that a given range of exam scores contains a person's true score, which acknowledges the presence of some degree of fault in test scores and that obtained test scores are only estimates of truthful scores (Geisinger, 2013).

Reliability is by and large assessed in four means:

i.

Exam-retest: Consistency of test scores over time (stability, temporal consistency);

2.

Inter-rater: Consistency of test scores among independent judges;

3.

Parallel or alternate forms: Consistency of scores beyond unlike forms of the test (stability and equivalence); and

4.

Internal consistency: Consistency of different items intended to measure out the same thing within the test (homogeneity). A special case of internal consistency reliability is dissever-one-half where scores on 2 halves of a single examination are compared and this comparison may be converted into an index of reliability.

A number of factors can touch the reliability of a test's scores. These include time between 2 testing administrations that affect test-retest and alternating-forms reliability, and similarity of content and expectations of subjects regarding dissimilar elements of the test in alternate forms, split-half, and internal consistency approaches. In addition, changes in subjects over time and introduced by concrete ailments, emotional problems, or the field of study'southward environment, or exam-based factors such as poor test instructions, subjective scoring, and guessing will too impact exam reliability. It is of import to notation that a test can generate reliable scores in one context and not in another, and that inferences that can be made from different estimates of reliability are not interchangeable (Geisinger, 2013).

Validity

While the scores resulting from a examination may exist deemed reliable, this finding does not necessarily mean that scores from the test have validity. Validity is defined as "the degree to which prove and theory back up the interpretations of test scores for proposed uses of tests" (AERA et al., 2014, p. 11). In discussing validity, information technology is of import to highlight that validity refers not to the measure out itself (i.due east., a psychological test is not valid or invalid) or the scores derived from the measure out, only rather the interpretation and utilise of the measure's scores. To be considered valid, the estimation of exam scores must be grounded in psychological theory and empirical evidence that demonstrates a relationship between the exam and what it purports to mensurate (Furr and Bacharach, 2013; Sireci and Sukin, 2013). Historically, the fields of psychology and pedagogy have described three primary types of evidence related to validity (Sattler, 2014; Sireci and Sukin, 2013):

ane.

Construct evidence of validity: The caste to which an individual's test scores correlate with the theoretical concept the test is designed to mensurate (i.e., prove that scores on a test correlate relatively highly with scores on theoretically similar measures and relatively poorly with scores on theoretically dissimilar measures);

ii.

Content evidence of validity: The degree to which the test content represents the targeted subject matter and supports a test'south use for its intended purposes; and

3.

Benchmark-related evidence of validity: The degree to which the test's score correlates with other measurable, reliable, and relevant variables (i.east., criterion) thought to measure the same construct.

Other kinds of validity with relevance to SSA have been advanced in the literature, simply are not completely accepted in professional standards as types of validity per se. These include

1.

Diagnostic validity: The degree to which psychological tests are truly aiding in the formulation of an appropriate diagnosis.

2.

Ecological validity: The degree to which examination scores represent everyday levels of functioning (e.g., impact of disability on an private's power to function independently).

iii.

Cultural validity: The degree to which test content and procedures accurately reflect the sociocultural context of the subjects being tested.

Each of these forms of validity poses complex questions regarding the use of detail psychological measures with the SSA population. For instance, ecological validity is especially critical in the use of psychological tests with SSA given that the focus of the assessment is on examining everyday levels of performance. Measures like intelligence tests have been sometimes criticized for lacking ecological validity (Groth-Marnat, 2009; Groth-Marnat and Teal, 2000). Alternatively, "research suggests that many neuropsychological tests have a moderate level of ecological validity when predicting everyday cerebral functioning" (Chaytor and Schmitter-Edgecombe, 2003, p. 181).

More than contempo discussions on validity have shifted toward an argument-based approach to validity, using a multifariousness of testify to build a case for validity of exam score estimation (Furr and Bacharach, 2013). In this approach, construct validity is viewed as an overarching paradigm under which evidence is gathered from multiple sources to build a case for validity of test score interpretation. Five key sources of validity prove that affect the degree to which a test fulfills its purpose are generally considered (AERA et al., 2014; Furr and Bacharach, 2013; Sireci and Sukin, 2013):

1.

Test content: Does the test content reflect the important facets of the construct being measured? Are the test items relevant and advisable for measuring the construct and congruent with the purpose of testing?

ii.

Relation to other variables: Is there a human relationship between exam scores and other criterion or constructs that are expected to be related?

3.

Internal structure: Does the actual structure of the examination match the theoretically based structure of the construct?

iv.

Response processes: Are respondents applying the theoretical constructs or processes the test is designed to measure?

5.

Consequences of testing: What are the intended and unintended consequences of testing?

Standardization and Testing Norms

As part of the evolution of whatever psychometrically sound mensurate, explicit methods and procedures by which tasks should be administered are determined and clearly spelled out. This is what is ordinarily known as standardization. Typical standardized administration procedures or expectations include (1) a quiet, relatively lark-free environment, (2) precise reading of scripted instructions, and (3) provision of necessary tools or stimuli. All examiners employ such methods and procedures during the process of collecting the normative data, and such procedures usually should exist used in whatever other administration, which enables application of normative data to the private being evaluated (Lezak et al., 2012).

Standardized tests provide a ready of normative information (i.e., norms), or scores derived from groups of people for whom the mensurate is designed (i.e., the designated population) to which an individual's performance can be compared. Norms consist of transformed scores such as percentiles, cumulative percentiles, and standard scores (due east.k., T-scores, Z-scores, stanines, IQs), allowing for comparison of an private's test results with the designated population. Without standardized assistants, the individual's performance may not accurately reflect his or her power. For example, an private'southward abilities may be overestimated if the examiner provides boosted information or guidance than what is outlined in the examination administration transmission. Conversely, a claimant'south abilities may be underestimated if appropriate instructions, examples, or prompts are not presented. When nonstandardized assistants techniques must be used, norms should exist used with circumspection due to the systematic fault that may be introduced into the testing process; this topic is discussed in particular later in the affiliate.

It is of import to clearly understand the population for which a particular examination is intended. The standardization sample is some other proper noun for the norm grouping. Norms enable one to make meaningful interpretations of obtained test scores, such equally making predictions based on evidence. Developing advisable norms depends on size and representativeness of the sample. In general, the more people in the norm group the closer the approximation to a population distribution so long as they represent the group who volition be taking the test.

Norms should be based upon representative samples of individuals from the intended exam population, every bit each person should have an equal chance of existence in the standardization sample. Stratified samples enable the examination developer to identify particular demographic characteristics represented in the population and more closely approximate these features in proportion to the population. For example, intelligence test scores are oft established based upon census-based norming with proportional representation of demographic features including race and ethnic group membership, parental educational activity, socioeconomic status, and geographic region of the country.

When tests are applied to individuals for whom the test was not intended and, hence, were not included equally role of the norm group, inaccurate scores and subsequent misinterpretations may upshot. Tests administered to persons with disabilities oft raise complex issues. Examination users sometimes utilize psychological tests that were not developed or normed for individuals with disabilities. It is disquisitional that tests used with such persons (including SSA inability claimants) include attending to representative norming samples; when such norming samples are not available, it is important for the assessor to notation that the test or tests used are not based on representative norming samples and the potential implications for interpretation (Turner et al., 2001).

Test Fairness in High-Stakes Testing Decisions

Performance on psychological tests often has pregnant implications (high stakes) in our society. Tests are in part the gatekeepers for educational and occupational opportunities and play a part in SSA determinations. Every bit such, results of psychological testing may accept positive or negative consequences for an individual. Often such consequences are intended; notwithstanding, there is the possibility for unintended negative consequences. It is imperative that problems of test fairness be addressed so no private or group is disadvantaged in the testing procedure based upon factors unrelated to the areas measured by the test. Biases only cannot exist present in these kinds of professional determinations. Moreover, it is imperative that research demonstrates that measures tin be fairly and equivalently used with members of the diverse subgroups in our population. It is of import to notation that at that place are people from many language and cultural groups for whom there are no available tests with norms that are accordingly representative for them. As noted above, in such cases it is important for assessors to include a statement about this situation whenever it applies and potential implications on scores and resultant interpretation.

While all tests reflect what is valued within a detail cultural context (i.e., cultural loading), bias refers to the presence of systematic mistake in the measurement of a psychological construct. Bias leads to inaccurate test results given that scores reflect either overestimations or underestimations of what is being measured. When bias occurs based upon culturally related variables (e.g., race, ethnicity, social class, gender, educational level) so there is evidence of cultural examination bias (Suzuki et al., 2014).

Relevant considerations pertain to issues of equivalence in psychological testing equally characterized by the following (Suzuki et al., 2014, p. 260):

1.

Functional: Whether the construct being measured occurs with equal frequency across groups;

2.

Conceptual: Whether the particular information is familiar across groups and ways the aforementioned thing in various cultures;

3.

Scalar: Whether average score differences reflect the aforementioned degree, intensity, or magnitude for unlike cultural groups;

4.

Linguistic: Whether the language used has similar meaning across groups; and

5.

Metric: Whether the scale measures the same behavioral qualities or characteristics and the mensurate has similar psychometric properties in different cultures.

Information technology must be established that the measure is operating appropriately in various cultural contexts. Test developers address bug of equivalence through procedures including

  • Proficient console reviews (i.e., professionals review item content and provide informed judgments regarding potential biases);

  • Test of differential item functioning (DIF) amidst groups;

  • Statistical procedures assuasive comparison of psychometric features of the examination (e.g., reliability coefficients) based on different population samples;

  • Exploratory and confirmatory factor assay, structural equation modeling (i.e., examination of the similarities and differences of the constructs construction), and measurement invariance; and

  • Mean score differences taking into consideration the spread of scores within particular racial and indigenous groups as well as amidst groups.

Cultural equivalence refers to whether "interpretations of psychological measurements, assessments, and observations are similar if not equal across unlike ethnocultural populations" (Trimble, 2010, p. 316). Cultural equivalence is a college society form of equivalence that is dependent on measures coming together specific criteria indicating that a measure may exist appropriately used with other cultural groups beyond the i for which it was originally developed. Trimble (2010) notes that there may be upward of fifty or more types of equivalence that affect interpretive and procedural practices in gild to establish cultural equivalence.

Particular Response Theory and Tests 2

For well-nigh of the 20th century, the dominant measurement model was called classical test theory. This model was based on the notion that all scores were equanimous of ii components: truthful score and mistake. One can imagine a "true score" as a hypothetical value that would correspond a person's actual score were in that location no error nowadays in the assessment (and unfortunately, there is ever some error, both random and systematic). The model farther assumes that all error is random and that any correlation betwixt error and another variable, such every bit truthful scores, is effectively zero (Geisinger, 2013). The arroyo leans heavily on reliability theory, which is largely derived from the premises mentioned above.

Since the 1950s and largely since the 1970s, a newer mathematically sophisticated model developed called detail response theory (IRT). The premise of these IRT models is most easily understood in the context of cognitive tests, where there is a correct reply to questions. The simplest IRT model is based on the notion that the answering of a question is generally based on merely two factors: the difficulty of the question and the ability level of the test-taker. Estimator-adaptive testing estimates scores of the test-taker after each response to a question and adjusts the administration of the next question appropriately. For example, if a examination-taker answers a question correctly, he or she is probable to receive a more than hard question next. If 1, on the other hand, answers incorrectly, he or she is more than probable to receive an easier question, with the "running score" held by the computer adjusted accordingly. Information technology has been found that such calculator-adaptive tests tin can be very efficient.

IRT models take made the equating of exam forms far easier. Equating tests permits 1 to utilize different forms of the same examination with different test items to yield fully comparable scores due to slightly different item difficulties across forms. To catechumen the values of item difficulty to determine the test-taker's ability scores one needs to have some common items across various tests; these common items are known equally anchor items. Using such items, one can essentially establish a fixed reference grouping and base judgments from other groups on these values.

Equally noted above, there are a number of common IRT models. Among the most mutual are the i-, two-, and three-parameter models. The one-parameter model is the one already described; the only item parameter is item difficulty. A 2-parameter model adds a second parameter to the first, related to item discrimination. Particular discrimination is the ability of the item to differentiate those defective the ability in high degree from those holding information technology. Such two-parameter models are oft used for tests like essay tests where ane cannot achieve a high score by guessing or using other ways to respond currently. The three-parameter IRT model contains a tertiary parameter, that factor related to chance level correct scoring. This parameter is sometimes called the pseudo-guessing parameter, and this model is generally used for large-scale multiple-pick testing programs.

These models, because of their lessened reliance on the sampling of test-takers, are very useful in the equating of tests that is the setting of scores to be equivalent regardless of the form of the test one takes. In some high-stakes admissions tests such every bit the GRE, MCAT, and GMAT, for example, forms are scored and equated past virtue of IRT methods, which can perform such operations more efficiently and accurately than can be done with classical statistics.

TEST USER QUALIFICATIONS

The test user is more often than not considered the person responsible for advisable apply of psychological tests, including choice, administration, interpretation, and use of results (AERA et al., 2014). Test user qualifications include attention to the buy of psychological measures that specify levels of preparation, educational degree, areas of knowledge inside domain of assessment (due east.g., ethical administration, scoring, and estimation of clinical cess), certifications, licensure, and membership in professional person organizations. Test user qualifications require psychometric noesis and skills as well as preparation regarding the responsible use of tests (e.g., ideals), in particular, psychometric and measurement knowledge (i.e., descriptive statistics, reliability and measurement fault, validity and the meaning of test scores, normative interpretation of examination scores, option of appropriate tests, and test assistants procedures). In addition, test user guidelines highlight the importance of agreement the bear on of ethnic, racial, cultural, gender, age, educational, and linguistic characteristics in the selection and utilise of psychological tests (Turner et al., 2001).

Test publishers provide detailed manuals regarding the operational definition of the construct being assessed, norming sample, reading level of exam items, completion time, administration, and scoring and interpretation of test scores. Directions presented to the examinee are provided verbatim and sample responses are oft provided to help the examiner in determining a right or wrong response or in awarding numbers of points to a particular answer. Ethical and legal noesis regarding assessment competencies, confidentiality of test information, test security, and legal rights of exam-takers are imperative. Resource like the Mental Measurements yearbook (MMy) provide descriptive information and evaluative reviews of commercially available tests to promote and encourage informed test selection (Buros, 2015). To be included, tests must contain sufficient documentation regarding their psychometric quality (e.thou., validity, reliability, norming).

Test Administration and Interpretation

In accordance with the Standards for Educational and Psychological Testing (AERA et al., 2014) and the APA's Guidelines for Test User Qualifications (Turner et al., 2001), many publishers of psychological tests employ a tiered organisation of qualification levels (generally A, B, C) required for the purchase, assistants, and interpretation of such tests (e.g., PAR, due north.d.; Pearson Didactics, 2015). Many instruments, such equally those discussed throughout this report, would be considered qualification level C cess methods, by and large requiring an avant-garde caste, specialized psychometric and measurement cognition, and formal training in administration, scoring, and estimation. Still, some may accept less stringent requirements, for example, a available's or main'due south caste in a related field and specialized training in psychometric cess (often classified level B), or no special requirements (often classified level A) for buy and employ. While such categories serve as a general guide for necessary qualifications, private test manuals provide additional item and specific qualifications necessary for administration, scoring, and estimation of the test or measure.

Given the demand for the use of standardized procedures, any person administering cognitive or neuropsychological measures must be well trained in standardized administration protocols. He or she should possess the interpersonal skills necessary to build rapport with the individual being tested in social club to foster cooperation and maximal effort during testing. Additionally, individuals administering tests should understand important psychometric properties, including validity and reliability, as well as factors that could emerge during testing to place either at risk. Many doctoral-level psychologists are well trained in test administration; in general, psychologists from clinical, counseling, school, or educational graduate psychology programs receive training in psychological exam administration. For cases in which cognitive deficits are being evaluated, a neuropsychologist may exist needed to most accurately evaluate cerebral functioning (see Affiliate 5 for a more than detailed discussion on assistants and interpretation of cognitive tests). The utilise of non-doctoral-level psychometrists or technicians in psychological and neuropsychological exam administration and scoring is as well a widely accepted standard of practice (APA, 2010; Brandt and van Gorp, 1999; Pearson Instruction, 2015). Psychometrists are oftentimes bachelor's- or master'south-level individuals who have received additional specialized training in standardized test administration and scoring. They practise non practise independently or interpret exam scores, but rather piece of work nether the close supervision and direction of doctoral-level clinical psychologists or neuropsychologists.

Interpretation of testing results requires a higher degree of clinical training than administration lone. Threats to the validity of any psychological measure of a self-written report nature oblige the test interpreter to understand the test and principles of test construction. In fact, interpreting tests results without such cognition would violate the ideals code established for the profession of psychology (APA, 2010). SSA requires psychological testing be "individually administered by a qualified specialist … currently licensed or certified in the state to administer, score, and interpret psychological tests and take the grooming and experience to perform the examination" (SSA, n.d.). Almost doctoral-level clinical psychologists who have been trained in psychometric test administration are also trained in exam interpretation. SSA (n.d.) also requires individuals who administer more specific cerebral or neuropsychological evaluations "be properly trained in this area of neuroscience." As such, clinical neuropsychologists—individuals who have been specifically trained to interpret testing results within the framework of encephalon-behavior relationships and who have accomplished sure educational and grooming benchmarks as delineated past national professional organizations—may be required to translate tests of a cerebral nature (AACN, 2007; NAN, 2001).

Use of Interpreters and Other Nonstandardized Test Administration Techniques

Modification of procedures, including the utilise of interpreters and the administration of nonstandardized assessment procedures, may pose unique challenges to the psychologist by potentially introducing systematic fault into the testing process. Such errors may be related to linguistic communication, the use of translators, or examinee abilities (e.chiliad., sensory, perceptual, and/or motor capacity). For instance, if one uses a language interpreter, the potential for mistranslation may yield inaccurate scores. Use of translators is a nonpreferred selection, and assessors need to be familiar with both the language and culture from which an individual comes to properly interpret test results, or even infer whether specific measures are appropriate. The adaptation of tests has become big business for testing companies, and many tests, most often measures developed in English for use in the United States, are being adapted for use in other countries. Such measures require changes in language, just translators must besides be knowledgeable about culture and the environment of the region from which a person comes (ITC, 2005).

For sensory, perceptual, or motor abilities, ane may exist altering the construct that the examination is designed to mensurate. In both of these examples, one could be obtaining scores for which there is no referenced normative grouping to allow for accurate estimation of results. While a thorough give-and-take of these concepts is beyond the scope of this written report and is presented elsewhere, it may be stated that when a test is administered following a procedure that is outside of that which has been developed in the standardization procedure, conclusions drawn must recognize the potential for error in their creation.

PSYCHOLOGICAL TESTING IN THE CONTEXT OF Disability DETERMINATIONS

As noted in Chapter 2, SSA indicates that objective medical evidence may include the results of standardized psychological tests. Given the great variety of psychological tests, some are more objective than others. Whether a psychological test is appropriately considered objective has much to do with the procedure of scoring. For example, unstructured measures that call for open up-ended responding rely on professional judgment and interpretation in scoring; thus, such measures are considered less than objective. In contrast, standardized psychological tests and measures, such as those discussed in the ensuing chapters, are structured and objectively scored. In the case of non-cerebral cocky-written report measures, the respondent by and large answers questions regarding typical behavior by choosing from a set of predetermined answers. With cerebral tests, the respondent answers questions or solves issues, which usually have right answers, as well as he or she mayhap tin can. Such measures generally provide a set up of normative data (i.e., norms), or scores derived from groups of people for whom the measure out is designed (i.e., the designated population), to which an private's responses or performance can exist compared. Therefore, standardized psychological tests and measures rely less on clinical judgment and are considered to be more than objective than those that depend on subjective scoring. Dissimilar measurements such as weight or blood force per unit area standardized psychological tests require the individual's cooperation with respect to cocky-report or operation on a chore. The inclusion of validity testing, which will exist discussed further in Chapters iv and 5, in the test or test battery allows for greater confidence in the test results. Standardized psychological tests that are accordingly administered and interpreted can be considered objective evidence.

The use of psychological tests in disability determinations has disquisitional implications for clients. Equally noted earlier, issues surrounding ecological validity (i.e., whether test functioning accurately reflects real-world behavior) is of main importance in SSA determination. Two approaches have been identified in relation to the ecological validity of neuropsychological assessment. The starting time focuses on "how well the exam captures the essence of everyday cerebral skills" in order to "identify people who have difficulty performing real-world tasks, regardless of the etiology of the problem" (i.east., verisimilitude), and the 2nd "relates performance on traditional neuropsychological tests to measures of real-world operation, such as employment condition, questionnaires, or clinician ratings" (i.e., veridicality) (Chaytor and Schmitter-Edgecombe, 2003, pp. 182–183). Establishing ecological validity is a complicated endeavor given the potential upshot of non-cognitive factors (due east.thou., emotional, physical, and environmental) on exam and everyday operation. Specific concerns regarding exam operation include (1) the test surround is often not representative (i.due east., artificial), (two) testing yields only samples of beliefs that may fluctuate depending on context, and (3) clients may possess compensatory strategies that are not employable during the testing situation; therefore, obtained scores underestimate the exam-taker's abilities.

Activities of daily living (ADLs) and the client'due south likelihood of returning to work are important considerations in inability determinations. Occupational status, nevertheless, is complex and ofttimes multidetermined requiring that psychological exam data exist complemented with other sources of information in the evaluation procedure (due east.g., observation, informant ratings, ecology assessments) (Chaytor and Schmitter-Edgecombe, 2003). Table 3-i highlights major mental disorders, relevant types of psychological measures, and domains of functioning.

TABLE 3-1. Listings for Mental Disorders and Types of Psychological Tests.

Tabular array 3-one

Listings for Mental Disorders and Types of Psychological Tests.

Determination of inability is dependent on two key factors: the existence of a medically determinable harm and associated limitations on operation. Equally discussed in detail in Chapter 2, applications for disability follow a five-step sequential disability conclusion procedure. At Stride 3 in the process, the applicant'due south reported impairments are evaluated to determine whether they see or equal the medical criteria codification in SSA'southward Listing of Impairments. This includes specific symptoms, signs, and laboratory findings that substantiate the existence of an impairment (i.e., Paragraph A criteria) and evidence of associated functional limitations (i.eastward., Paragraph B criteria). If an applicant's impairments meet or equal the list criteria, the merits is immune. If non, residual functional capacity, including mental residual functional capacity, is assessed. This includes whether the bidder has the capacity for by work (Step 4) or whatsoever piece of work in the national economy (Step five).

SSA uses a standard assessment that examines functioning in four domains: understanding and memory, sustained concentration and persistence, social interaction, and accommodation. Psychological testing may play a key function in understanding a client's functioning in each of these areas. Box 3-i describes ways in which these four areas of core mental residual functional capacity are assessed ecologically. Psychological assessments frequently address these areas in a more structured manner through interviews, standardized measures, checklists, observations, and other assessment procedures.

Box Icon

BOX 3-1

Descriptions of Tests by Iv Areas of Core Mental Residue Functional Capacity. Recall location and work-like procedures Sympathize and remember very short and unproblematic instructions

This chapter has identified some of the basic foundations underlying the use of psychological tests including bones psychometric principles and bug regarding examination fairness. Applications of tests can inform disability determinations. The next two capacity build on this overview, examining the types of psychological tests that may be useful in this process, including a review of selected individual tests that have been developed for measuring validity of presentation. Chapter 4 focuses on non-cognitive, self-written report measures and symptom validity tests. Chapter 5 and so focuses on cognitive tests and associated performance validity tests. Strengths and limitations of diverse instruments are offered, in order to subsequently explore the relevance for unlike types of tests for dissimilar claims, per category of disorder, with a focus on establishing the validity of the client'due south claim.

REFERENCES

  • AACN (American Academy of Clinical Neuropsychology). AACN practice guidelines for neuropsychological cess and consultation. Clinical Neuropsychology. 2007;21(2):209–231. [PubMed: 17455014]

  • AERA (American Educational Research Association), APA (American Psychological Association), and NCME (National Council on Measurement in Didactics). Standards for educational and psychological testing. Washington, DC: AERA; 2014.

  • Brandt J, van Gorp W. American Academy of Clinical Neuropsychology policy on the utilise of non-doctoral-level personnel in conducting clinical neuropsychological evaluations. The Clinical Neuropsychologist. 1999;xiii(4):385–385.

  • Chaytor N, Schmitter-Edgecombe 1000. The ecological validity of neuropsychological tests: A review of the literature on everyday cognitive skills. Neuropsychology Review. 2003;13(4):181–197. [PubMed: 15000225]

  • Cronbach LJ. Essentials of psychological testing. New York: Harper; 1949.

  • Cronbach LJ. Essentials of psychological testing. 2nd. Oxford, England: Harper; 1960.

  • De Ayala RJ. Theory and exercise of item response theory. New York: Guilford Publications; 2009.

  • DeMars C. Item response theory. New York: Oxford University Press; 2010.

  • Furr RM, Bacharach VR. Psychometrics: An introduction. Thousand Oaks, CA: Sage Publications, Inc.; 2013.

  • Geisinger KF. Reliability. Geisinger KF, Bracken BA, Carlson JF, Hansen JC, Kuncel NR, Reise SP, Rodriguez MC, editors. Washington, DC: APA; APA handbook of testing and assessment in psychology. 2013;1

  • Groth-Marnat G. Handbook of psychological assessment. Hoboken, NJ: John Wiley & Sons; 2009.

  • Groth-Marnat G, Teal M. Block design as a measure of everyday spatial ability: A study of ecological validity. Perceptual and Motor Skills. 2000;90(ii):522–526. [PubMed: 10833749]

  • Hambleton RK, Pitoniak MJ. Setting performance standards. Educational Measurement. 2006;4:433–470.

  • ITC (International Exam Commission). ITC guidelines for translating and adaptating tests. Geneva, Switzerland: ITC; 2005.

  • Lezak K, Howieson D, Bigler E, Tranel D. Neuropsychological cess. 5th. New York: Oxford University Press; 2012.

  • Sattler JM. Foundations of behavioral, social, and clinical assessment of children. sixth. La Mesa, CA: Jerome M. Sattler, Publisher, Inc.; 2014.

  • Sireci SG, Sukin T. Test validity. Geisinger KF, Bracken BA, Carlson JF, Hansen JC, Kuncel NR, Reise SP, Rodriguez MC, editors. Washington, DC: APA; APA handbook of testing and assessment in psychology. 2013;1

  • Suzuki LA, Naqvi South, Loma JS. Assessing intelligence in a cultural context. Leong FTL, Comas-Diaz Fifty, Nagayama Hall GC, McLoyd VC, Trimble JE, editors. Washington, DC: APA; APA handbook of multicultural psychology. 2014;ane

  • Trimble JE. Encyclopedia of cantankerous-cultural school psychology. New York: Springer; 2010. Cultural measurement equivalence; pp. 316–318.

  • Turner SM, DeMers ST, Fox HR, Reed G. APA's guidelines for exam user qualifications: An executive summary. American Psychologist. 2001;56(12):1099.

  • Weiner IB. The assessment process. In: Weiner IB, editor. Handbook of psychology. Hoboken, NJ: John Wiley & Sons; 2003.

1

This may be in comparison to a nationally representative norming sample, or with certain tests or measures, such as the MMPI, particular clinically diagnostic samples.

2

The brief overview presented here draws on the works of De Ayala (2009) and DeMars (2010), to which the reader is directed for additional data.

salcidochaptes.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/books/NBK305233/

Belum ada Komentar untuk "Guided Reading Activity 13-1 Characteristics of Psychological Tests Answer Key"

Posting Komentar

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel