This is a user sandbox of Dwagg96. You can use it for testing or practicing edits. This is not the sandbox where you should draft your assigned article for a dashboard.wikiedu.org course. To find the right sandbox for your assignment, visit your Dashboard course page and follow the Sandbox Draft link for your assigned article in the My Articles section. |
This is a user sandbox of Dwagg96. You can use it for testing or practicing edits. This is not the sandbox where you should draft your assigned article for a dashboard.wikiedu.org course. To find the right sandbox for your assignment, visit your Dashboard course page and follow the Sandbox Draft link for your assigned article in the My Articles section. |
This is an assessment template that can be used to create Wikipedia articles on noted psychological assessments.
In general, according to WP:MEDRS, medical articles should be written in the following format:
Lead section
editThis will be the lead section. This section should give a quick summary of what the assessment is. Here are some pointers (please do not use bullet points when writing article):
- What are its acronyms?
- What is its purpose?
- What population is it intended for? What do the items measure?
- How long does it take to administer?
- Who (individual or groups) was it created by?
- How many questions are inside? Is it multiple choice?
- What has been its impact on the clinical world in general?
- Who uses it? Clinicians? Researchers? What settings?
Contents
editTemplate for writing medical-test articles
editThis section is NOT included in the actual page. It is an overview of what is generally included in a page.
- Versions, if more than one kind or variant of the test or procedure exists
- Psychometrics, including validity and reliability of test results
- History of the test
- Use in other populations, such as other cultures and countries
- Research
- Limitations
Versions
edit- What are the versions of this test that exists, if any? For each section, there should be a description of the test.
- What is its intended population, number of questions and acronyms?
Reliability
editThe rubrics for evaluating reliability and validity are here. You will evaluate the instrument based on these rubrics. Then, you will delete the code for the rubric and complete the table (located after the rubrics). Don't forget to adjust the headings once you copy/paste the table in!
An example using the table from the General Behavior Inventory is attached below.
Example tables
editEvaluating norms and reliability
editCriterion | Adequate | Good | Excellent | Too Good |
---|---|---|---|---|
Norms | Mean and standard deviation for total score (and subscores if relevant) from a large, relevant clinical sample | Mean and standard deviation for total score (and subscores if relevant) from multiple large, relevant samples, at least one clinical and one nonclinical | Same as “good,” but must be from representative sample (i.e., random sampling, or matching to census data) | Not a concern |
Internal consistency (Cronbach's alpha, split half, etc.) | Most evidence shows Cronbach's alpha values of .70 to .79 | Most reported alphas .80 to .89 | Most reported alphas >= .90 | Alpha is also tied to scale length and content coverage - very high alphas may indicate that scale is longer than needed, or that it has a very narrow scope |
Inter-rater reliability | Most evidence shows kappas of .60-.74, or intraclass correlations of .70-.79 | Most reported kappas of .75-.84, ICCs of .80-.89 | Most kappas ≥ .85, or ICCs ≥ .90 | Very high levels of agreement often achieved by re-rating from audio or transcript |
Test-retest reliability (stability) | Most evidence shows test-retest correlations ≥ .70 over period of several days or weeks | Most evidence shows test-retest correlations ≥ .70 over period of several months | Most evidence shows test-retest correlations ≥ .70 over a year or longer | Key consideration is appropriate time interval; many constructs would not be stable for years at a time |
*Repeatability | Bland-Altman plots (Bland & Altman, 1986) plots show small bias, and/or weak trends; coefficient of repeatability is tolerable compared to clinical benchmarks (Vaz, Falkmer, Passmore, Parsons, & Andreou, 2013) | Bland-Altman plots and corresponding regressions show no significant bias, and no significant trends; coefficient of repeatability is tolerable | Bland-Altman plots and corresponding regressions show no significant bias, and no significant trends across multiple studies; coefficient of repeatability is small enough that it is not clinically concerning | Not a concern |
Validity
editCriterion | Adequate | Good | Excellent | *Too Excellent |
---|---|---|---|---|
Content validity | Test developers clearly defined domain and ensured representation of entire set of facets | As adequate, plus all elements (items, instructions) evaluated by judges (experts or pilot participants) | As good, plus multiple groups of judges and quantitative ratings | Not a problem; can point out that many measures do not cover all of the DSM criteria now |
Construct validity (e.g., predictive, concurrent, convergent, and discriminant validity) | Some independently replicated evidence of construct validity | Bulk of independently replicated evidence shows multiple aspects of construct validity | As good, plus evidence of incremental validity with respect to other clinical data | Not a problem |
*Discriminative validity | Statistically significant discrimination in multiple samples; Areas Under the Curve (AUCs) < .6 under clinically realistic conditions (i.e., not comparing treatment seeking and healthy youth) | AUCs of .60 to <.75 under clinically realistic conditions | AUCs of .75 to .90 under clinically realistic conditions | AUCs >.90 should trigger careful evaluation of research design and comparison group. More likely to be biased than accurate estimate of clinical performance. |
*Prescriptive validity | Statistically significant accuracy at identifying a diagnosis with a well-specified matching intervention, or statistically significant moderator of treatment | As “adequate,” with good kappa for diagnosis, or significant treatment moderation in more than one sample | As “good,” with good kappa for diagnosis in more than one sample, or moderate effect size for treatment moderation | Not a problem with the measure or finding, per se; but high predictive validity may obviate need for other assessment components. Compare on utility. |
Validity generalization | Some evidence supports use with either more than one specific demographic group or in more than one setting | Bulk of evidence supports use with either more than one specific demographic group or in multiple settings | Bulk of evidence supports use with either more than one specific demographic group and in multiple settings | Not a problem |
Treatment sensitivity | Some evidence of sensitivity to change over course of treatment | Independent replications show evidence of sensitivity to change over course of treatment | As good, plus sensitive to change across different types of treatments | Not a problem |
Clinical utility | After practical considerations (e.g., costs, ease of administration and scoring, duration, availability of relevant benchmark scores, patient acceptability), assessment data are likely to be clinically useful | As adequate, plus published evidence that using the assessment data confers clinical benefit (e.g., better outcome, lower attrition, greater satisfaction) | As good, plus independent replication | Not a problem |
Actual tables to fill in
editReliability
editReliability refers to whether the scores are reproducible.
Criterion | Rating (adequate, good, excellent, too good*) | Explanation with references |
---|---|---|
Norms | Adequate | Multiple convenience samples and research studies, including both clinical and nonclinical samples[citation needed] |
Internal consistency (Cronbach’s alpha, split half, etc.) | Excellent; too good for some contexts | Alphas routinely over .94 for both scales, suggesting that scales could be shortened for many uses[citation needed] |
Inter-rater reliability | Not applicable | Designed originally as a self-report scale; parent and youth report correlate about the same as cross-informant scores correlate in general[1] |
Test-retest reliability (stability | Good | r = .73 over 15 weeks. Evaluated in initial studies,[2] with data also show high stability in clinical trials[citation needed] |
Repeatability | Not published | No published studies formally checking repeatability |
Validity describes the evidence that an assessment tool measures what it was supposed to measure. There are many different ways of checking validity. For screening measures such as the CAGE, diagnostic accuracy and discriminative validity are probably the most useful ways of looking at validity.
Validity
editValidity describes the evidence that an assessment tool measures what it was supposed to measure. There are many different ways of checking validity. For screening measures, diagnostic accuracy and discriminative validity are probably the most useful ways of looking at validity.
Criterion | Rating (adequate, good, excellent, too good*) | Explanation with references |
---|---|---|
Content validity | Excellent | Covers both DSM diagnostic symptoms and a range of associated features[2] |
Construct validity (e.g., predictive, concurrent, convergent, and discriminant validity) | Excellent | Shows convergent validity with other symptom scales, longitudinal prediction of development of mood disorders,[3][4][5] criterion validity via metabolic markers[2][6] and associations with family history of mood disorder.[7] Factor structure complicated;[2][8] the inclusion of “biphasic” or “mixed” mood items creates a lot of cross-loading |
Discriminative validity | Excellent | Multiple studies show that GBI scores discriminate cases with unipolar and bipolar mood disorders from other clinical disorders[2][9][10] effect sizes are among the largest of existing scales[11] |
Validity generalization | Good | Used both as self-report and caregiver report; used in college student[8][12] as well as outpatient[9][13][14] and inpatient clinical samples; translated into multiple languages with good reliability |
Treatment sensitivity | Good | Multiple studies show sensitivity to treatment effects comparable to using interviews by trained raters, including placebo-controlled, masked assignment trials[15][16] Short forms appear to retain sensitivity to treatment effects while substantially reducing burden[16][17] |
Clinical utility | Good | Free (public domain), strong psychometrics, extensive research base. Biggest concerns are length and reading level. Short forms have less research, but are appealing based on reduced burden and promising data |
Development and history
edit- Why was this instrument developed? Why was there a need to do so? What need did it meet?
- What was the theoretical background behind this assessment? (e.g. addresses importance of 'negative cognitions', such as intrusions, inaccurate, sustained thoughts)
- How was the scale developed? What was the theoretical background behind it?
- How are these questions reflected in applications to theories, such as cognitive behavioral therapy (CBT)?
- If there were previous versions, when were they published?
- Discuss the theoretical ideas behind the changes
Impact
edit- What was the impact of this assessment? How did it affect assessment in psychiatry, psychology and health care professionals?
- What can the assessment be used for in clinical settings? Can it be used to measure symptoms longitudinally? Developmentally?
Use in other populations
edit- How widely has it been used? Has it been translated into different languages? Which languages?
Research
edit- Any recent research done that is pertinent?
Limitations
edit- If self report, what are usual limitations of self-report?
- State the status of this assessment (is it copyrighted? If free, link to it).
See also
editHere, it would be good to link to any related articles on Wikipedia. As we create more assessment pages, this should grow.
For instance: