This study examined the robustness and efficiency of four large language models (LLMs), GPT-4, GPT-3.5, iFLYTEK and Baidu Cloud, in assessing the writing accuracy of the Chinese language. Writing samples were collected from students in an online high school Chinese language learning program in the US. The official APIs of the LLMs were utilized to conduct analyses at both the T-unit and sentence levels. Performance metrics were employed to evaluate the LLMs’ performance. The LLM results were compared to human rating results. Content analysis was conducted to categorize error types and highlight the discrepancies between human and LLM ratings. Additionally, the efficiency of each model was evaluated. The results indicate that GPT models and iFLYTEK achieved similar accuracy scores, with GPT-4 excelling in precision. These findings provide insights into the potential of LLMs in supporting the assessment of writing accuracy for language learners.
With the development of web-based science inquiry learning, behavioral engagement in such learning contexts received more and more attention. Combined with specific science inquiry stages: comparative experiment design, implementation with computer simulation, and reflection on results, the current study explored a series of features from log data to conceptualize students’ behavioral engagement. The features were divided into three categories: general engagement features including time, game the system, submission frequency, and revisiting behavior; learning content related features including context consistency, comparative experimental design, and experiment design consistency; and instruction related features consisting of revision behavior and revision improvement. 220 sixth graders from four classes in China participated in the study. Correlation and regression analysis were used to analyze the relationship between engagement features and learning performance...