Open Menu Close Menu

ChatGPT

Turnitin: Of 38M Submissions Since April 4, 3.5% Had At Least 80% AI-Written Text

Analysis of 6 Weeks' Worth of Educator Submissions Leads to Updates for Turnitin's AI Detection Platform

In the first six weeks of educators using Turnitin’s new AI writing detection feature, the platform processed 38.5 million submissions, and the results — as well as plenty of feedback from educators and administrators — led Turnitin to tweak the detector and to further explain the meaning and accuracy rates of the detection scores. 

Of the submissions run through the AI detector, Turnitin said 3.5% contained more than 80% AI-written text, and just under one-tenth of submissions contained at least 20% AI-written text.

In a new blog post, Turnitin Chief Product Officer Annie Chechitelli explains the findings and details a few tweaks to the platform’s AI detection feature, in response to feedback from educators using it since its launch in early April.


Updates to the AI detection feature include:

  • Asterisk Added to Scores Under 20%: An asterisk will now appear next to the indicator “score” — or the percentage of a submission considered to be AI-written text — when the score is less than 20%, since the analysis of submissions thus far shows that false positives are higher when the detector finds less than 20% of a document is AI-written. The asterisk indicates that the score is less reliable, according to the blog post. 

  • Minimum Word Count Raised: The minimum number of words required for the AI detector to work has been raised from 150 to 300, because the detector is more accurate the longer a submission is, Chechitelli said. “Results show that our accuracy increases with a little more text, and our goal is to focus on long-form writing. We may adjust this minimum word requirement over time based on the continuous evaluation of our model.”

  • Changes to Detector Analysis of Opening and Closing Sentences: “We also observed a higher incidence of false positives in the first few or last few sentences of a document,” Chechitelli said. “Many times, these sentences are the introduction or conclusion in a document. As a result, we have changed how we aggregate these specific sentences for detection to reduce false positives.”

In their feedback, instructors’ and administrators’ main concern is false positives for “AI writing detection in general and in specific cases within our writing detecion,” according to the blog post. Since the release of the detection feature, Turnitin has seen that “real-world use is yielding different results” from lab tests performed during development, Chechitelli said.

The findings follow Turnitin’s investigation of cases where educators flag a submission for additional scrutiny due to questionable detection results, and an additional study of 800,000 academic writing samples — written before the release of ChatGPT — run through Turnitin’s AI detector.

Other findings from the detector’s first six weeks in use by educators include confusion about how to interpret Turnitin’s scores or AI writing metrics, Chechitelli said. 

She explained that the detector calculates two different statistics: the AI writing metric at the document level and at the sentence level.


comments powered by Disqus