最新论文精选| 测评素养、口语机器评分、高考试卷分析



本期内容为2020年7—8月的论文精选,推荐的论文来自中国考试、教育测量与评价、Language Assessment Quarterly、Language Testing等期刊。本期推荐的论文主要涉及口语考试自动评分效度验证、外语教师测评素养探析、成绩报告单设计、高考英语试卷解析、Rasch模型在测评中的应用、居家在线考试综述等内容

▶ 01 大学英语四级口语考试自动评分效度初探


▶ 02 外语教师语言测评素养再探——基于对语言测试专家的访谈


▶ 03 落实评价体系促进全面发展 考查关键能力彰显改革方向——2020年高考英语全国卷试题评析


▶ 04 TOEFL Junior 考试成绩报告单的设计与启示

摘要:TOEFL Junior考试在中国已经实施近十年,它是适应留学低龄化这一时代趋势的产物,其成绩报告单不仅提供了总分成绩及整体表现描述,而且包括了“听力理解”“语言形式和含义”“阅读理解”各单项成绩及能力表述,同时提供了与欧洲共同语言框架体系对接后的等级,以及考生“阅读理解”单项表现与蓝思阅读分级体系对接后的蓝思指数。TOEFL Junior考试的成绩报告单能较为全面且准确地反映出考生的英语学习情况,呈现出规范性、权威性、可对接性以及实用性等特点,为学校、学生、家长提供了一个国际英语的客观测评标准。借鉴TOEFL Junior考试成绩报告单的设计经验,我国在大规模考试分数报告单中除了体现分数,还有必要对各个学科领域的能力水平进行细化描述,提供学生能力水平等级参照,给出学习建议,指导并促进学生的课后学习,同时改进表达方式,如多采用鼓励性、建议性的肯定描述等。
关键词:TOEFL Junior考试;成绩报告单;国际性语言测评工具

▶ 05 Exploring the Construct Validity of the ECCE: Latent Structure of a CEFR-Based High-Intermediate Level English Language Proficiency Test

作者:Minkyung Kim; Scott A. Crossley
期刊Language Assessment Quarterly
发表时间:Published online July 4, 2020
摘要:This paper focuses on the Examination for the Certificate of Competency in English (ECCE), which is based on the CEFR and assesses high intermediate level English proficiency. Specifically, the study explores the latent structure of the ECCE and its generalizability across groups (i.e., gender, age, and first language [L1]) to examine its construct validity, dimensionality of language proficiency, and commensurability with the CEFR macro-functions. The results indicated that test-takers’ performance on the ECCE could be best represented by a correlated three-factor model (i.e., reading/listening/lexico-grammar, writing, and speaking abilities). The correlated three-factor model also held irrespective of gender, age, and L1 (with the exception of vocabulary scores). Overall, the findings indicate that the correlated three-factor model is consistent with the constructs that the ECCE proposes to measure, is in line with the current multi-componential view of language proficiency, and is partly commensurate with the CEFR macro-functions.

▶ 06 A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research

作者:Vahid Aryadoust; Li Ying Ng; Hiroki Sayama
期刊Language Testing
发表时间:First Published July 8, 2020
摘要:In the present study, we reviewed and coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet Rasch measurement (n=100) and Facets (n=113) being the most frequently used Rasch model and software, respectively. Significant differences were detected between the number of papers that applied Rasch measurement to different language skills and components, with writing (n=63) and grammar (n=12) being the most and least frequently investigated, respectively. In addition, significant differences were found between the number of papers reporting person separation (n=73, not reported: n=142) and item separation (n=59, not reported: n=156) and those that did not. An alarming finding was how few papers reported unidimensionality check (n=57 vs 158) and local independence (n=19 vs 196). Finally, a multilayer network analysis revealed that research involving Rasch measurement has created two major discrete communities of practice (clusters), which can be characterized by features such as language skills, the Rasch models used, and the reporting of item reliability/separation vs person reliability/separation. Guidelines and recommendations for analyzing unidimensionality, local independence, data-to-model fit, and reliability in Rasch model analysis are proposed.
关键词:Fit; language assessment; local independence; network analysis; modularity maximization method; Rasch measurement; reliability and separation; unidimensionality

▶ 07 Test review: Current options in at-home language proficiency tests for making high-stakes decisions

作者:Daniel R. Isbell; Benjamin Kremmel
期刊Language Testing
发表时间:First Published July 16, 2020
摘要:The switch to accepting at-home proficiency tests for high-stakes decisions raises many concerns for stakeholders, such as technological demands, exam security, and validity of score use. Along these lines, this thematic review addresses such concerns and features brief reviews of seven options in at-home proficiency testing: ACTFL Assessments, Duolingo English Test, IELTS Indicator, LanguageCert, TEF Express, TOEFL iBT Special Home Edition, and Versant. Considering at home testing more broadly, we discuss key considerations for selecting an at-home test. We close with speculation on how at-home tests may shape language testing going forward: Beyond adapting to the current pandemic, at-home testing might address longstanding issues in access to language testing services and the representation of real-world communication practices in language tests.
关键词:ACTFL Assessments; at-home test; Duolingo English Test; IELTS Indicator; LanguageCert; security; TEF Express; TOEFL iBT Special Home Edition; Versant

▶ 08 Predicting communicative effectiveness in the international workplace: Support for TOEIC Speaking test scores from linguistic laypersons

作者:Jonathan Schmidgall; Donald E. Powers
期刊Language Testing
发表时间:First Published July 21, 2020
摘要:In this study we examined the extent to which TOEIC® Speaking test scores relate to evaluations by professionals in the international workplace, the target language use domain of TOEIC tests. Linguistic laypersons in 10 countries were invited to participate in an online research survey. The survey incorporated a stratified sample of test-taker (N = 99) responses to three representative tasks from the TOEIC Speaking test (reading a text aloud, responding to questions, expressing an opinion) that were cast as workplace role-play tasks. After completing each role-play task, participants used brief, descriptive six-point rating scales to rate the communicative effectiveness (comprehensibility, task fulfillment, elaboration, and coherence) of each of several speakers. Communicative effectiveness ratings from linguistic laypersons were strongly correlated with TOEIC Speaking test scaled scores (r = 0.84). In addition, regression analysis was used to plot the relationship between layperson and test-based evaluations of speaking proficiency. Results suggested that test takers’ performances can be expected to be perceived as effective at score ranges typically associated with important decisions. The results are discussed in terms of their implications for claims about the generalizability of TOEIC Speaking test score interpretations in relation to the evaluations of linguistic laypersons in the international workplace.
关键词:Communicative effectiveness; comprehensibility; English as a lingua franca; speaking; validation; workplace