• String
  • Posts
  • Reconciling AI ethics of Remarks Co-Pilot

Reconciling AI ethics of Remarks Co-Pilot

and Product Requirements Document (PRD)

A product requirements document (PRD) for Remarks Co-Pilot
For those with @schools/ @moe email, you can try it here

Landing page of Remarks Co-Pilot

  • 1 Objective/Goal: Reduce time spent drafting the base copy of termly, qualitative student remarks

  • 2 Features: 

    • File/ CSV upload - enable users to upload their own data

      • Goal - help define structured data about the student so there is teacher input to work with

    • Prompt storage and customization - have a generic prompt for less experienced users while having the flexibility to accommodate school nuances regarding termly remarks

      • Goal - reduce writer and corresponding vector effort to accommodate specific diction nuances

      • This was initially out of scope

    • Output parsed by LLM in a preferred file format - using the teacher input and parsing it by the prompt

      • Goal - reduce time preparing a draft for further revision

  • 3 UX Flow & Design Notes (Screens made by our designer, Natalie)

    • Authenticate > amend prompt if needed > upload csv of structured data > download csv with LLM output > amend as necessary to accurately represent the user

Login flow using email one-time password (OTP) just like other Open Government Products

Update profile page

User Dashboard with the main user action “generate remarks” displayed in the top right corner

  • 4 System & Environment Requirements: like many String prototypes, Remarks Co-Pilot is a web-application that does not require much computing power

  • 5 Assumptions, Constraints & Dependencies: 

  • 5.1 Assumptions:

    • [Validated] Teachers find termly remarks tedious

    • [Contestable] Generative AI reduces the load to come up with the first draft of remarks without misrepresenting the student

  • 5.2 Constraints:

    • AI policy and corresponding sensitivity/ concerns concerning student names being fed into a large language model

  • 5.3 Dependencies:

    • An accessible large language model. We chose to go with OpenAI API using the model GPT3.5 “ChatGPT” but am exploring how to use an Azure model instead

    • Some backend to store information in order to get a sense of usage and user demographics. We used CockroachDB out of convenience since as it was paired in OGP’s starter kit out of the box

    • Quality input. Similar lessons from econometrics - garbage in, garbage out. The teacher needs to select descriptors, Areas For Improvement (AFI) in genuine good faith otherwise the output will be generic and inaccurate.

Backend: CockroachDB; app deployed using Open Government Products (OGP) Starter Kit which helped to settled auth/ email OTP out-of-the-box

AI Ethics of Remarks Co-Pilot

Why is using Remarks Co-Pilot ‘bad’?

  1. Misrepresenting students, inappropriate tone or diction are all possible drawbacks of an unchecked output from a large language model when it comes to writing qualitative remarks.

  2. It makes teachers lazy?

Largely due to variants/ combinations of the above two reasons, there are some reservations towards using Remarks Co-Pilot. I understand these concerns.

Other considerations:

  • Vetter pains. Making this product allowed me to interact with remarks vetters from schools and their painpoints are actually surprisingly structured - it involves substitutions that can be rule-based and fulfilled by good prompt engineering.

  • Bad faith actors will always continue to act in bad faith - this tool does not eliminate acting in bad judgment; regrettably writing remarks involves copying and pasting, selecting from an adjectives bank that could be markedly similar to a workflow proposed here

  • The people who potentially benefit the most from Remarks Co-Pilot are not those strong in written expression/ in English.

Risk mitigation

  • Human-in-the-loop and a review process is important. The team from Remarks Co-Pilot strongly advises against dumping the output from an LLM without editing.


  • Teachers learn to process tasks programmatically by considering student remarks as a batch task

  • Eases vetter woes and integrates school-based preferences as part of the generative process

  • Non-English input can be used to churn an output in English for those more comfortable in a different language.

I remember learning to identify sonnets in school and how to tell the different types apart.

Whether it is a haiku, sonnet or a palindrome (word or sentence spelled the same way forward or backwards), these can be done/ checked programmatically. As and creative as expressive as they seem, there is method to the madness.

Practising JavaScript Algorithms and Data Structures - Palindrome Checker and thinking about parallels wth Remarks Co-Pilot

There are some things that can be done programmatically and others that should not.

If we think of the anatomy of a student remark as a piece of structured information with clear rules, could the first draft that could be offloaded ethically with generative AI?