Reconciling AI ethics of Remarks Co-Pilot
and Product Requirements Document (PRD)
Landing page of Remarks Co-Pilot
1 Objective/Goal: Reduce time spent drafting the base copy of termly, qualitative student remarks
File/ CSV upload - enable users to upload their own data
Goal - help define structured data about the student so there is teacher input to work with
Prompt storage and customization - have a generic prompt for less experienced users while having the flexibility to accommodate school nuances regarding termly remarks
Goal - reduce writer and corresponding vector effort to accommodate specific diction nuances
This was initially out of scope
Output parsed by LLM in a preferred file format - using the teacher input and parsing it by the prompt
Goal - reduce time preparing a draft for further revision
3 UX Flow & Design Notes (Screens made by our designer, Natalie)
Authenticate > amend prompt if needed > upload csv of structured data > download csv with LLM output > amend as necessary to accurately represent the user
Login flow using email one-time password (OTP) just like other Open Government Products
Update profile page
User Dashboard with the main user action “generate remarks” displayed in the top right corner
4 System & Environment Requirements: like many String prototypes, Remarks Co-Pilot is a web-application that does not require much computing power
5 Assumptions, Constraints & Dependencies:
[Validated] Teachers find termly remarks tedious
[Contestable] Generative AI reduces the load to come up with the first draft of remarks without misrepresenting the student
AI policy and corresponding sensitivity/ concerns concerning student names being fed into a large language model
An accessible large language model. We chose to go with OpenAI API using the model GPT3.5 “ChatGPT” but am exploring how to use an Azure model instead
Some backend to store information in order to get a sense of usage and user demographics. We used CockroachDB out of convenience since as it was paired in OGP’s starter kit out of the box
Quality input. Similar lessons from econometrics - garbage in, garbage out. The teacher needs to select descriptors, Areas For Improvement (AFI) in genuine good faith otherwise the output will be generic and inaccurate.
Backend: CockroachDB; app deployed using Open Government Products (OGP) Starter Kit which helped to settled auth/ email OTP out-of-the-box
AI Ethics of Remarks Co-Pilot
Why is using Remarks Co-Pilot ‘bad’?
Misrepresenting students, inappropriate tone or diction are all possible drawbacks of an unchecked output from a large language model when it comes to writing qualitative remarks.
It makes teachers lazy?
Largely due to variants/ combinations of the above two reasons, there are some reservations towards using Remarks Co-Pilot. I understand these concerns.
Vetter pains. Making this product allowed me to interact with remarks vetters from schools and their painpoints are actually surprisingly structured - it involves substitutions that can be rule-based and fulfilled by good prompt engineering.
Bad faith actors will always continue to act in bad faith - this tool does not eliminate acting in bad judgment; regrettably writing remarks involves copying and pasting, selecting from an adjectives bank that could be markedly similar to a workflow proposed here
The people who potentially benefit the most from Remarks Co-Pilot are not those strong in written expression/ in English.
Human-in-the-loop and a review process is important. The team from Remarks Co-Pilot strongly advises against dumping the output from an LLM without editing.
Teachers learn to process tasks programmatically by considering student remarks as a batch task
Eases vetter woes and integrates school-based preferences as part of the generative process
Non-English input can be used to churn an output in English for those more comfortable in a different language.
I remember learning to identify sonnets in school and how to tell the different types apart.
Whether it is a haiku, sonnet or a palindrome (word or sentence spelled the same way forward or backwards), these can be done/ checked programmatically. As and creative as expressive as they seem, there is method to the madness.
There are some things that can be done programmatically and others that should not.
If we think of the anatomy of a student remark as a piece of structured information with clear rules, could the first draft that could be offloaded ethically with generative AI?
Come share your thoughts at Meta on 20 Nov, 9am at String’s next meetup (: