String
Posts
Reconciling AI ethics of Remarks Co-Pilot

Reconciling AI ethics of Remarks Co-Pilot

and Product Requirements Document (PRD)

String Team & Kahhow Lee
October 16, 2023

A product requirements document (PRD) for Remarks Co-Pilot
For those with @schools/ @moe email, you can try it here

Landing page of Remarks Co-Pilot

1 Objective/Goal: Reduce time spent drafting the base copy of termly, qualitative student remarks
2 Features:
- File/ CSV upload - enable users to upload their own data
  - Goal - help define structured data about the student so there is teacher input to work with
- Prompt storage and customization - have a generic prompt for less experienced users while having the flexibility to accommodate school nuances regarding termly remarks
  - Goal - reduce writer and corresponding vector effort to accommodate specific diction nuances
  - This was initially out of scope
- Output parsed by LLM in a preferred file format - using the teacher input and parsing it by the prompt
  - Goal - reduce time preparing a draft for further revision
3 UX Flow & Design Notes (Screens made by our designer, Natalie)
- Authenticate > amend prompt if needed > upload csv of structured data > download csv with LLM output > amend as necessary to accurately represent the user

Update profile page

User Dashboard with the main user action “generate remarks” displayed in the top right corner

4 System & Environment Requirements: like many String prototypes, Remarks Co-Pilot is a web-application that does not require much computing power
5 Assumptions, Constraints & Dependencies:
5.1 Assumptions:
- [Validated] Teachers find termly remarks tedious
- [Contestable] Generative AI reduces the load to come up with the first draft of remarks without misrepresenting the student
5.2 Constraints:
- AI policy and corresponding sensitivity/ concerns concerning student names being fed into a large language model
5.3 Dependencies:
- An accessible large language model. We chose to go with OpenAI API using the model GPT3.5 “ChatGPT” but am exploring how to use an Azure model instead
- Some backend to store information in order to get a sense of usage and user demographics. We used CockroachDB out of convenience since as it was paired in OGP’s starter kit out of the box
- Quality input. Similar lessons from econometrics - garbage in, garbage out. The teacher needs to select descriptors, Areas For Improvement (AFI) in genuine good faith otherwise the output will be generic and inaccurate.

Backend: CockroachDB; app deployed using Open Government Products (OGP) Starter Kit which helped to settled auth/ email OTP out-of-the-box

AI Ethics of Remarks Co-Pilot

Why is using Remarks Co-Pilot ‘bad’?

Misrepresenting students, inappropriate tone or diction are all possible drawbacks of an unchecked output from a large language model when it comes to writing qualitative remarks.
It makes teachers lazy?

Largely due to variants/ combinations of the above two reasons, there are some reservations towards using Remarks Co-Pilot. I understand these concerns.

Other considerations:

Vetter pains. Making this product allowed me to interact with remarks vetters from schools and their painpoints are actually surprisingly structured - it involves substitutions that can be rule-based and fulfilled by good prompt engineering.
Bad faith actors will always continue to act in bad faith - this tool does not eliminate acting in bad judgment; regrettably writing remarks involves copying and pasting, selecting from an adjectives bank that could be markedly similar to a workflow proposed here
The people who potentially benefit the most from Remarks Co-Pilot are not those strong in written expression/ in English.

Risk mitigation

Human-in-the-loop and a review process is important. The team from Remarks Co-Pilot strongly advises against dumping the output from an LLM without editing.

Benefits

Teachers learn to process tasks programmatically by considering student remarks as a batch task
Eases vetter woes and integrates school-based preferences as part of the generative process
Non-English input can be used to churn an output in English for those more comfortable in a different language.

I remember learning to identify sonnets in school and how to tell the different types apart.

Whether it is a haiku, sonnet or a palindrome (word or sentence spelled the same way forward or backwards), these can be done/ checked programmatically. As and creative as expressive as they seem, there is method to the madness.

Practising JavaScript Algorithms and Data Structures - Palindrome Checker and thinking about parallels wth Remarks Co-Pilot

There are some things that can be done programmatically and others that should not.

If we think of the anatomy of a student remark as a piece of structured information with clear rules, could the first draft that could be offloaded ethically with generative AI?

Come share your thoughts at Meta on 20 Nov, 9am at String’s next meetup (: