[email protected]

科学发展研究

Scientific Development Research

您当前位置:首页 > 精选文章

Scientific Development Research . 2025; 5: (7) ; 10.12208/j.sdr.20250263 .

Comparison of scoring reliability and feedback content between iWrite and DeepSeek
iWrite与DeepSeek的评分信度与反馈内容对比分析

作者: 胡婕妤 *

武汉纺织大学外国语学院 湖北武汉

*通讯作者: 胡婕妤,单位:武汉纺织大学外国语学院 湖北武汉;

引用本文: 胡婕妤 iWrite与DeepSeek的评分信度与反馈内容对比分析[J]. 科学发展研究, 2025; 5: (7) : 25-30.
Published: 2025/11/20 10:40:41

摘要

随着大语言模型性能的不断优化,大语言模型逐渐被教育者用来批改作文和提供反馈,与专门的作文评阅系统iWrite相比,其评分信度与反馈性能如何? 是否能成为一款可以信赖的评分与反馈工具。为探究此问题,本研究以国内某大学国际合作办学院系中艺术专业大二两个班的46篇雅思作文为样本,对比分析iWrite与DeepSeek的评分信度与反馈内容,以期为教育工作者在选择评分与反馈工具时提供借鉴。

关键词: iWrite;DeepSeek;英语写作;评分信度;反馈

Abstract

With the constant improvement of the performance of Large Language Models(LLMs), LLMs are gradually employed by teachers to score students’ writing and provide feedback for them. Compared with the professional Automated Essay Scoring system such as iWrite, the scoring reliability and the performance of generating feedback of LLMs are unclear. It remains doubtful whether these LLMs can be used as reliable tools for scoring and providing feedback. In order to answer this question, this study conducts a comparative analysis of iWrite and DeepSeek, evaluating their scoring reliability and feedback performance on IELTS writing tasks completed by 46 sophomore Art majors in a Chinese-foreign cooperative university program. It aims to provide some insights into choosing automated scoring and feedback tools for teachers and researchers.

Key words: iWrite; DeepSeek; English writing; Scoring reliability; Feedback

参考文献 References

[1] 高乔.中国人工智能创新你何以令海外惊叹 [N].人民日报海外版,2025-02-15(006). 

[2] Guo. D.,Yang. D.,Zhang, H., et al. Deepseek-R1: Incentivizing Reasoning Capability in LLms Via Reinforcement Learning[R/OL]. (2025-01-29) [2025-09- 15]. https://arxiv.org/abs/2501.12948

[3] 张福慧,李文滔,龙宓吟,高瑛. 基于三个技术平台的自我调节性写作学习效果对比研究 [J].外语电化教学, 2019, (10):22-26.

[4] 王昕,李钦萌.英语专业大学生学术英语写作线上多元反馈模式探索[J]. 外语研究,2023,(4):44-50.

[5] 李艳玲,田夏春.iWrite 2.0在线英语作文评分信度研究[J].现代教育技术, 2018, 28(2):75-80.

[6] 马小森.AES系统iWrite反馈能力及评分信度研究[J]. 海外英语, 2024,(3):99-101.

[7] Mizumoto, A. & Eguchi, M. Exploring the potential of using an AI language model for automated essay scoring [J]. Research Methods in Applied Linguistics, 2023, 2(2): Article 100050.

[8] 殷小娟,林庆英.ChatGPT与AES系统对大学英语写作的反馈效度比较[J]. 闽江学院学报,2024,(3):78-92.

[9] 冯庆华. DeepSeek在翻译教学与研究中的创新应用[J].中国翻译,2025,(2):58-67.

[10] 张天成.大语言模型与人类评分员的对比研究[J].外语测试与教学,2025,(3):31-38,58.

[11] 马睿朵. DeepSeek与批改网写作批改效能对比研究[J].计算机时代, 2025,(7):62-65.

[12] 董艳云,祁昕阳,马晓梅.基于GPT-4的英语写作自动化评估探索----以雅思写作任务2为例子[J].语言测试与评价,2024,(2):13-30.

[13] Akoglu, H.User’s guide to correlation coefficients[J]. Turkish Journal of Emergency Medicine,2018,18(3) : 91-93.

[14] 林莉兰.基于电子档案袋测评的评分者间信度分析报告[J].西安外国语大学学报,2021,29( 4) : 67-72.

[15] 陈曦,胡中峰.基于DeepSeek的智能评分:效度、信度与可行性研究[J].高教探索,2025,(3):62-67.