Abstract
Ensuring internal validity in quantitative research requires, among other conditions, reliable instrumentation. Unfortunately, however, second language (L2) researchers often fail to report and even more often fail to interpret reliability estimates beyond generic benchmarks for acceptability. As a means to guide interpretations of such estimates, this article meta-analyzes reliability coefficients (internal consistency, interrater, and intrarater) as reported in published L2 research. We recorded 2,244 reliability estimates in 537 individual articles along with study (e.g., sample size) and instrument features (e.g., item formats) proposed to influence reliability. We also coded for the indices employed (e.g., alpha, KR20). The coefficients were then aggregated (i.e., meta-analyzed). The three types of reliability varied, with internal consistency as the lowest: median =.82. Interrater and intrarater estimates were substantially higher (.92 and.95, respectively). Overall estimates were also found to vary according to study and instrument features such as proficiency (low =.79, intermediate =.84, advanced =.89) and target skill (e.g., writing =.88 vs. listening =.77). We use our results to inform and encourage interpretations of reliability estimates relative to the larger field as well as to the substantive and methodological features particular to individual studies and subdomains.
Original language | English (US) |
---|---|
Pages (from-to) | 538-553 |
Number of pages | 16 |
Journal | Modern Language Journal |
Volume | 100 |
Issue number | 2 |
DOIs | |
State | Published - Jun 1 2016 |
Externally published | Yes |
Fingerprint
Keywords
- instrumentation
- meta-analysis
- psychometrics
- reliability
- second language
- testing
ASJC Scopus subject areas
- Linguistics and Language
- Language and Linguistics
Cite this
A Meta-Analysis of Reliability Coefficients in Second Language Research. / Plonsky, Luke D; Derrick, Deirdre J.
In: Modern Language Journal, Vol. 100, No. 2, 01.06.2016, p. 538-553.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - A Meta-Analysis of Reliability Coefficients in Second Language Research
AU - Plonsky, Luke D
AU - Derrick, Deirdre J.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Ensuring internal validity in quantitative research requires, among other conditions, reliable instrumentation. Unfortunately, however, second language (L2) researchers often fail to report and even more often fail to interpret reliability estimates beyond generic benchmarks for acceptability. As a means to guide interpretations of such estimates, this article meta-analyzes reliability coefficients (internal consistency, interrater, and intrarater) as reported in published L2 research. We recorded 2,244 reliability estimates in 537 individual articles along with study (e.g., sample size) and instrument features (e.g., item formats) proposed to influence reliability. We also coded for the indices employed (e.g., alpha, KR20). The coefficients were then aggregated (i.e., meta-analyzed). The three types of reliability varied, with internal consistency as the lowest: median =.82. Interrater and intrarater estimates were substantially higher (.92 and.95, respectively). Overall estimates were also found to vary according to study and instrument features such as proficiency (low =.79, intermediate =.84, advanced =.89) and target skill (e.g., writing =.88 vs. listening =.77). We use our results to inform and encourage interpretations of reliability estimates relative to the larger field as well as to the substantive and methodological features particular to individual studies and subdomains.
AB - Ensuring internal validity in quantitative research requires, among other conditions, reliable instrumentation. Unfortunately, however, second language (L2) researchers often fail to report and even more often fail to interpret reliability estimates beyond generic benchmarks for acceptability. As a means to guide interpretations of such estimates, this article meta-analyzes reliability coefficients (internal consistency, interrater, and intrarater) as reported in published L2 research. We recorded 2,244 reliability estimates in 537 individual articles along with study (e.g., sample size) and instrument features (e.g., item formats) proposed to influence reliability. We also coded for the indices employed (e.g., alpha, KR20). The coefficients were then aggregated (i.e., meta-analyzed). The three types of reliability varied, with internal consistency as the lowest: median =.82. Interrater and intrarater estimates were substantially higher (.92 and.95, respectively). Overall estimates were also found to vary according to study and instrument features such as proficiency (low =.79, intermediate =.84, advanced =.89) and target skill (e.g., writing =.88 vs. listening =.77). We use our results to inform and encourage interpretations of reliability estimates relative to the larger field as well as to the substantive and methodological features particular to individual studies and subdomains.
KW - instrumentation
KW - meta-analysis
KW - psychometrics
KW - reliability
KW - second language
KW - testing
UR - http://www.scopus.com/inward/record.url?scp=84977476165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977476165&partnerID=8YFLogxK
U2 - 10.1111/modl.12335
DO - 10.1111/modl.12335
M3 - Article
AN - SCOPUS:84977476165
VL - 100
SP - 538
EP - 553
JO - Modern Language Journal
JF - Modern Language Journal
SN - 0026-7902
IS - 2
ER -