- String
- Posts
- Can we really detect plagiarism by GPT?
Can we really detect plagiarism by GPT?
Recap of Tech Talk for Teachers #5 (19 Dec 2022)
There is good reason to give ChatGPT a shot because it definitely won't be free forever. It takes significant computational power which is why many people walk away finding its performance phenomenal.
The aim of today is simply to try playing police and thief for ChatGPT - can we really detect whether it is used?
We will using the (1) GPT-2 output detector model, based on the 🤗/Transformers implementation of RoBERTa (you might have seen this in a popular post; (2) using originality.ai
The short answer is actually, it can be pretty default to detect.
Before your students bypass it, I comment on just the basics behind GPT again which then hopefully helps explain how it can be (easily) bypassed. In a super watered down explanation, language models work by predicting what naturally should come next.
The bottomline is: more unnatural the placement, the more likely it is for the detector to fail. During the live session, I demonstrated how randomly inserting forward slashes makes my piece of writing to be considered 'real' (or not plagiarised):
The first image shows that the detector is perfectly capable of detecting a plagiarised text if it is copy and pasted wholesale: 99.97% fake
But with 1 forward slash, that makes it 31.19% real:
With 2 forward slashes, it is now 97.73% real:
During our live session, we tried randomly inserting different punctuation marks and that yielded the same result - the detector fails.
Another quick way to bypass detection is simple paraphrasing - by changing the phrase "data in a secure location" to "encrypted stuff", this becomes 62.83% real
As show in the live demo, even for originality.ai (which is a paid software, no free plan can 25 Dec 2022) - it was unable to fare much better when dealing with weird injects such as forward slashes or mild paraphrases.
Bypassing detectors for plagiarism are not new - this is something that we will constantly face as educators. Even when turnitin (the plagiarism detector that most college students are used to), there were similar tips and tricks such as using quotation marks to mask out blatant copying since it is interpreted as a citation instead; using such quotation marks at extremely small font sizes and masked using a white-color font to avoid detection among other tricks.
Essentially, the thief will always try to outsmart the police. Our attention as educators perhaps isn't to outsmart or detect them all but strive to utilise such technology and live with it.
A notable Q&A during the session:
SI: Should all teachers be aware of this (ChatGPT) - or should we actively withhold information from students and teachers?
Thoughts from the author: Yes all teachers and students should be aware of the potential and perils of AI. Students definitely will be exposed through TikTok and other informal sources of information and educators might be in an important position to guide proper use of such technology and we need to hence know about it first.
The participants in the live session concurred with the insight that we ought to learn to be facilitators in the classroom.
Dilys, who used to teach Math jn Assumption English School - currently on PDL, doing my Masters in Columbia on Instructional Technology and Media - shared that students in the States are increasingly socialised to use AI as a learning aid. Instead of using it to "cheat" or copy answers wholesale, it could help with the brainstorming process or be a learning assistant.
In the podcast suggested in the follow-up, the speaker also notes how language models like ChatGPT democratise personalised learning assistants. Students can ask ChatGPT to explain difficult concepts they learnt in class if they struggle to comprehend that concept initially.
Beyond playing a role in recommending discretion in the use of AI, the String team has also explored use of AI by teachers to build more effective tools for teaching and learning. In particular, we have prototyped a public build for semantic search such that students can look up concepts that they are unsure of and be given a page reference to their lecture notes and/or a timestamp of a lecture video.
tldr: don't pay for originality.ai. It is worthwhile to explore what parameters or particular recommendations to make for both colleagues and students when it comes to responsible and effective use of AI
Suggested follow-up - both from Joyce who has been a super active participant and the String team is honestly very thankful for her energy and enthusiasm