test
Search publications, data, projects and authors

Text

English

ID: <

http://hdl.handle.net/2142/98159

>

Where these data come from
Using response times in CAT

Abstract

Many areas of psychology and education place a high premium on measurement, using psychometric theory to measure constructs, such as cognitive ability, personality, and attitudes. Some of the more well-known measurement theories used are classical test theory (CTT), structural equation modeling (SEM), and item response theory (IRT). For the practical test construction needs of psychology and education, IRT is the most heavily used, and has been ever since Lord and Novick (1968) published their book, Statistical Theories of Mental Test Scores. One of the biggest advances in IRT has been the advent of computerized adaptive testing (CAT). First introduced as tailored tests by Lord (1980), CATs have increasingly gained in popularity as the cost of computation has gone down. As suggested by the term "tailored tests," every person who takes an adaptive test takes a test form unique to the person. The test is constructed item-by-item by matching items' difficulty levels to the ability level of that particular person. The promise of CAT is that by constructing a test in this way items that do not contribute much to the overall effectiveness of the measurement are left out, which can shorten the test substantially while still maintaining a high level of measurement accuracy. The efficiency of CAT has not gone unnoticed. The Armed Services Vocational Aptitude Test Battery (ASVAB), which is used for measuring vocationally relevant abilities was originally introduced as a paper-and-pencil test in 1968, and became operational as a CAT in 1996; the ASVAB was the first large-scale, high-stakes operational CAT. Numerous adaptive tests have gone into operational use for use in selection including the Graduate Management Admission Test (GMAT), the Adjustable Competence Evaluation (ACE), the Business Language Testing Service (BULATS) Computer Test, the IBM Selection Tests, among others. Additionally, many licensure exams currently in use—including the Uniform CPA Examination (for certified personal accountants), and the National Council Licensure Examinations (for nurses)—are adaptive. Furthermore, the recently signed Every Student Succeeds Act has recommended a greater use of adaptive testing in the American educational system, allowing states to develop and administer CATs. Computer-based tests, such as CATs, allow for easy collection of response times. With the abundance of essentially-free data, methods and applications for using response time data have become en vogue, though they are still in their infant stage. As such, no large-scale assessments are currently using response times as an active part of the test. Because the data is essentially-free, it is reasonable to believe that their use is simply the next step in the evolution of computer-based tests. Indeed, it only seems natural that CATs be modified to take advantage of response time information, especially since it is well-known that response accuracy and response time are related (Sternberg, 1999). Some applications include cheating detection (van der Linden, 2009a), shortening the time needed to take a test (Choe & Kern, 2014; Fan, Wang, Chang, & Douglas, 2012), and item selection (van der Linden, 2008). The goal of this dissertation is to introduce CAT and some of the current issues surrounding its use, to introduce response times in measurement, and several new methods for using response times in adaptive testing. In the first chapter, a quick review of IRT will be given, including its historical roots, its assumptions, and some examples of commonly used IRT models. Following this will be a brief overview of the basic components of CAT, including item selection, ability (or trait) estimation, and item constraints. I will then discuss response times in measurement and their current role in CAT. In the second chapter, I will describe an already-completed study on estimating person ability and speededness jointly. In Chapter 3, I investigate the efficacy of using the MAP estimator developed in Chapter 2 when selecting items using a generalized time-weighted maximum information criterion (GMICT). In Chapter 4, I introduce a new item selection technique based on the ideas of Bayesian item selection that incorporates the response time model directly. A modified version of this criterion using the ideas from the GMICT is also investigated. In Chapter 5, I introduce a time-weighted Kullback-Leibler information technique and investigate its effectiveness. Finally, I conclude in Chapter 6 with some remarks about how these techniques fit in in the current literature on response times, scoring, and adaptive testing.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!