Equating in small-scale language testing programs

Geoffrey T. LaFlair, Daniel Isbell, L. D Nicolas May, Maria Nelly Gutierrez Arvizu, Joan M Jamieson

Research output: Contribution to journalArticle

Abstract

Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by estimates of quality, namely the method with the least error as defined by random error, systematic error, and total error. This study compared seven different equating methods to no equating – mean, linear Levine, linear Tucker, chained equipercentile, circle-arc, nominal weights mean, and synthetic. A non-equivalent groups anchor test (NEAT) design was used to compare two listening and reading test forms based on small samples (one with 173 test takers the other, 88) at a university’s English for Academic Purposes (EAP) program. The equating methods were evaluated based on the amount of error they introduced and their practical effects on placement decisions. It was found that two types of error (systematic and total) could not be reliably computed owing to the lack of an adequate criterion; consequently, only random error was compared. Among the seven methods, the circle-arc method introduced the least random error as estimated by the standard error of equating (SEE). Classification decisions made using the seven methods differed from no equating; all methods indicated that fewer students were ready for university placement. Although interpretations regarding the best equating method could not be made, circle-arc equating reduced the amount of random error in scores, had reportedly low bias in other studies, accounted for form and person differences, and was relatively easy to compute. It was chosen as the method to pilot in an operational setting.

Original languageEnglish (US)
Pages (from-to)127-144
Number of pages18
JournalLanguage Testing
Volume34
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

language
Language Testing
confidence
interpretation
human being
university
lack
trend
Placement
Group
student
Systematic Error

Keywords

  • English for academic purposes
  • equating
  • listening
  • placement
  • reading
  • sample size

ASJC Scopus subject areas

  • Language and Linguistics
  • Social Sciences (miscellaneous)
  • Linguistics and Language

Cite this

LaFlair, G. T., Isbell, D., May, L. D. N., Gutierrez Arvizu, M. N., & Jamieson, J. M. (2017). Equating in small-scale language testing programs. Language Testing, 34(1), 127-144. https://doi.org/10.1177/0265532215620825

Equating in small-scale language testing programs. / LaFlair, Geoffrey T.; Isbell, Daniel; May, L. D Nicolas; Gutierrez Arvizu, Maria Nelly; Jamieson, Joan M.

In: Language Testing, Vol. 34, No. 1, 01.01.2017, p. 127-144.

Research output: Contribution to journalArticle

LaFlair, GT, Isbell, D, May, LDN, Gutierrez Arvizu, MN & Jamieson, JM 2017, 'Equating in small-scale language testing programs', Language Testing, vol. 34, no. 1, pp. 127-144. https://doi.org/10.1177/0265532215620825
LaFlair GT, Isbell D, May LDN, Gutierrez Arvizu MN, Jamieson JM. Equating in small-scale language testing programs. Language Testing. 2017 Jan 1;34(1):127-144. https://doi.org/10.1177/0265532215620825
LaFlair, Geoffrey T. ; Isbell, Daniel ; May, L. D Nicolas ; Gutierrez Arvizu, Maria Nelly ; Jamieson, Joan M. / Equating in small-scale language testing programs. In: Language Testing. 2017 ; Vol. 34, No. 1. pp. 127-144.
@article{9f36f9376a174c80a4d1b5a452147f91,
title = "Equating in small-scale language testing programs",
abstract = "Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by estimates of quality, namely the method with the least error as defined by random error, systematic error, and total error. This study compared seven different equating methods to no equating – mean, linear Levine, linear Tucker, chained equipercentile, circle-arc, nominal weights mean, and synthetic. A non-equivalent groups anchor test (NEAT) design was used to compare two listening and reading test forms based on small samples (one with 173 test takers the other, 88) at a university’s English for Academic Purposes (EAP) program. The equating methods were evaluated based on the amount of error they introduced and their practical effects on placement decisions. It was found that two types of error (systematic and total) could not be reliably computed owing to the lack of an adequate criterion; consequently, only random error was compared. Among the seven methods, the circle-arc method introduced the least random error as estimated by the standard error of equating (SEE). Classification decisions made using the seven methods differed from no equating; all methods indicated that fewer students were ready for university placement. Although interpretations regarding the best equating method could not be made, circle-arc equating reduced the amount of random error in scores, had reportedly low bias in other studies, accounted for form and person differences, and was relatively easy to compute. It was chosen as the method to pilot in an operational setting.",
keywords = "English for academic purposes, equating, listening, placement, reading, sample size",
author = "LaFlair, {Geoffrey T.} and Daniel Isbell and May, {L. D Nicolas} and {Gutierrez Arvizu}, {Maria Nelly} and Jamieson, {Joan M}",
year = "2017",
month = "1",
day = "1",
doi = "10.1177/0265532215620825",
language = "English (US)",
volume = "34",
pages = "127--144",
journal = "Language Testing",
issn = "0265-5322",
publisher = "SAGE Publications Ltd",
number = "1",

}

TY - JOUR

T1 - Equating in small-scale language testing programs

AU - LaFlair, Geoffrey T.

AU - Isbell, Daniel

AU - May, L. D Nicolas

AU - Gutierrez Arvizu, Maria Nelly

AU - Jamieson, Joan M

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by estimates of quality, namely the method with the least error as defined by random error, systematic error, and total error. This study compared seven different equating methods to no equating – mean, linear Levine, linear Tucker, chained equipercentile, circle-arc, nominal weights mean, and synthetic. A non-equivalent groups anchor test (NEAT) design was used to compare two listening and reading test forms based on small samples (one with 173 test takers the other, 88) at a university’s English for Academic Purposes (EAP) program. The equating methods were evaluated based on the amount of error they introduced and their practical effects on placement decisions. It was found that two types of error (systematic and total) could not be reliably computed owing to the lack of an adequate criterion; consequently, only random error was compared. Among the seven methods, the circle-arc method introduced the least random error as estimated by the standard error of equating (SEE). Classification decisions made using the seven methods differed from no equating; all methods indicated that fewer students were ready for university placement. Although interpretations regarding the best equating method could not be made, circle-arc equating reduced the amount of random error in scores, had reportedly low bias in other studies, accounted for form and person differences, and was relatively easy to compute. It was chosen as the method to pilot in an operational setting.

AB - Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by estimates of quality, namely the method with the least error as defined by random error, systematic error, and total error. This study compared seven different equating methods to no equating – mean, linear Levine, linear Tucker, chained equipercentile, circle-arc, nominal weights mean, and synthetic. A non-equivalent groups anchor test (NEAT) design was used to compare two listening and reading test forms based on small samples (one with 173 test takers the other, 88) at a university’s English for Academic Purposes (EAP) program. The equating methods were evaluated based on the amount of error they introduced and their practical effects on placement decisions. It was found that two types of error (systematic and total) could not be reliably computed owing to the lack of an adequate criterion; consequently, only random error was compared. Among the seven methods, the circle-arc method introduced the least random error as estimated by the standard error of equating (SEE). Classification decisions made using the seven methods differed from no equating; all methods indicated that fewer students were ready for university placement. Although interpretations regarding the best equating method could not be made, circle-arc equating reduced the amount of random error in scores, had reportedly low bias in other studies, accounted for form and person differences, and was relatively easy to compute. It was chosen as the method to pilot in an operational setting.

KW - English for academic purposes

KW - equating

KW - listening

KW - placement

KW - reading

KW - sample size

UR - http://www.scopus.com/inward/record.url?scp=85008254472&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008254472&partnerID=8YFLogxK

U2 - 10.1177/0265532215620825

DO - 10.1177/0265532215620825

M3 - Article

AN - SCOPUS:85008254472

VL - 34

SP - 127

EP - 144

JO - Language Testing

JF - Language Testing

SN - 0265-5322

IS - 1

ER -