An instructor has accused students taking his agriculture science class at the University of Texas A&M-Commerce of cheating by using AI software to write their essays.
As detailed in a now-viral Reddit thread, Jared Mumm, a coordinator at the American university's department of agricultural sciences and natural resources, informed students he used ChatGPT to assess whether their submitted assignments were human-written or produced by computer.
We're told OpenAI's bot labeled at least some of the submitted work as machine crafted, leading to grades being withheld pending an investigation. Students caught up in the row have hit back, saying their essays were indeed written by them. As a result of this probe, diplomas are being temporarily withheld for those graduating. It's understood about half the class have had their diplomas put on hold.
Specifically, Mumm said he ran his seniors' final three essays through ChatGPT twice, and if the bot said both times for each piece that it wrote the work, he would flunk that paper.
"I will be giving everyone in this course an X," he reportedly told his class, and apparently told several students: "I'm not grading AI s***."
The University of Texas A&M-Commerce confirmed the X grade means incomplete, and is a temporary measure while the affair is being investigated. Several students have now been cleared of any cheating, we note, while some others are submitting a fresh essay to be graded. At least one pupil so far has admitted to using ChatGPT to complete assignments.
"A&M-Commerce confirms that no students failed the class or were barred from graduating because of this issue," the institution said in a statement. "University officials are investigating the incident and developing policies to address the use or misuse of AI technology in the classroom.
"They are also working to adopt AI detection tools and other resources to manage the intersection of AI technology and higher education. The use of AI in coursework is a rapidly changing issue that confronts all learning institutions. ChatGPT," it continued.
A representative from the university declined to comment further. The Register has asked Mumm for comment.
One person familiar with the brouhaha at the uni told us: "So far it seems the situation is mostly resolved: the school admitted to students that the grades should not have been withheld in the first place. It was completely out of protocol and an inappropriate use of ChatGPT. They haven’t addressed the foul language in accusations yet."
The kerfuffle highlights whether or not educators should use software to detect AI-produced content within submitted coursework. ChatGPT is not the greatest tool to use to classify machine-generated text; it cannot accurately determine whether students used the system to write their essays. Basically, it shouldn't be used this way.
Other types of software specifically built to detect text generated by AI models are often not reliable, either, as is becoming increasingly apparent.
A pre-publication study suggests it will be impossible to discern AI-written text as models improve. Vinu Sankar Sadasivan, a PhD student at the University of Maryland, and the first author of the paper, told us the chances of detecting AI-generated text using the best detectors is no better than flipping a coin.
"Generative AI text models are trained using human text data with the objective of making their output resemble that of humans," Sadasivan told us.
"Some of these AI models even memorize human text and output them in some instances without citing the actual text source. As these large language models improve over time to mimic humans, the best possible detector would achieve only an accuracy of nearly 50 percent.
"This is because the probability distribution of text output from human and AI models can nearly be the same for a sufficiently advanced [large language model], making detection hard. Hence, we theoretically show that the task of reliable text detection is impossible in practice."
The paper also showed that such software can be easily tricked into classifying AI text as human, if users make a few quick edits to paraphrase the outputs of a large language model. Sadasivan says universities and schools should not be using these detectors to check for plagiarism since they're unreliable.
"We should not use these detectors to make the final verdict. Borrowing words from my advisor, Prof Soheil Feizi: 'I think we need to learn to live with the fact that we may never be able to reliably say if a text is written by a human or an AI'," he said. ®
Source: The register