Computerized evaluation of students' free-text answers is already used in complex, high stakes testing, such as some of those conducted for advanced placement examinations. These automated scoring systems work best when student answers contain 200 or more words, and when the evaluation system has many examples, exemplifying both good and poor answers. We developed and assessed an algorithm which evaluates student short answers when the answer has as few as 10 words, with the answer key using only one correct answer. We sought such an algorithm so that individual instructors could ask students to produce short open-ended text responses to questions. Our algorithm automates the scoring of free-text answers, enabling instructors to embed such questions in online courses, and providing nearly immediate scoring and feedback on student responses. This algorithm is based on the semantic relatedness of the words in the student answer to the single correct answer provided by the instructor. The semantic relatedness algorithm requires a dedicated domain specific index or collection of topic-focused documents (a corpus), which is created by an automated crawl mechanism that collects documents based upon descriptive domain keywords.
We assessed the accuracy of this algorithm by collecting student answers to two text questions about botany. This material was not complex, but required students to understand what they read about Dutch Elm disease. Sixty-three students read the material, and submitted answers to two questions. Across two questions, students' answers averaged 9.5 words; the model answers from the instructor averaged 20 words. Across the two questions, the algorithm's scores correlated .76 with the scores from a panel of four human raters. For the same answers, individual human raters' ratings correlated .88 with the other three raters. While the algorithm was not as accurate as individual human raters, we believe it shows enough promise to investigate in applied field tests with real instructors and students.