A package for generating AI feedback on student work
Project description
ai-autograding-feedback
Overview
This program is part of an exploratory project to evaluate the quality of LLM-generated feedback in assisting with assignment grading and enhancing student learning. This program processes either the code sections, text sections, or images of a student's submission to programming assignments, based on the provided arguments. It generates output into a markdown file or standard output.
The large language models used and implementation logic vary depending on whether the selected scope is 'image', 'code' or 'text'.
For the code scope, the program takes three files:
- An assignment's solution file
- A student's submission file
- A test file
For the text scope, the program takes two files:
- An assignment's solution file
- A student's submission file
For the image scope, the program takes up to two files, depending on the prompt used:
- A student's submission file
- (Optional) An assignment's solution file
Features
- Handles image, text and code scopes.
- Reads pre-defined prompts specified in JSON files.
- Uses an argument parser for structured command-line input.
- Supports various Large Language Models to evaluate student assignment submissions.
- Saves response output in Markdown format with a predefined template or prints to stdout.
Argument Details
| Argument | Description | Required |
|---|---|---|
--submission_type |
Type of submission (from arg_options.FileType) |
❌ |
--prompt |
Pre-defined prompt name or file path to custom prompt file | ❌ ** |
--prompt_text |
String prompt | ❌ ** |
--scope |
Processing scope (image or code or text) |
✅ |
--submission |
Submission file path | ✅ |
--question |
Specific question to evaluate | ❌ |
--model |
Model type (from arg_options.Models) |
✅ |
--output |
File path for where to record the output | ❌ |
--solution |
File path for the solution file | ❌ |
--test_output |
File path for the file containing the results from tests | ❌ |
--submission_image |
File path for the submission image file | ❌ |
--solution_image |
File path for the solution image file | ❌ |
--system_prompt |
Pre-defined system prompt name or file path to custom system prompt | ❌ |
--llama_mode |
How to invoke deepSeek-v3 (choices in arg_options.LlamaMode) |
❌ |
--output_template |
Output template file (from `arg_options.OutputTemplate) | ❌ |
--json_schema |
File path to json file for schema for structured output | ❌ |
--marking_instructions |
File path to marking instructions/rubric | ❌ |
--model_options |
Comma-separated key-value pairs of model options and their values | ❌ |
** One of either --prompt or --prompt_text must be selected. If both are provided, --prompt_text will be appended to the contents of the file specified by --prompt. |
Scope
The program supports three scopes: code or text or image. Depending on which is selected, the program supports different models and prompts tailored for each option.
If the "code" scope is selected, the program will identify student errors in the code sections of the assignment, comparing them to the solution code. Additionally, if the --scope code option is chosen, the --question option can also be specified to analyze the code for a particular question rather than the entire file. Currently, you can specify a question number if the file type is jupyter notebook. In order to use the --question option, the question code in both the solution and submission file must be delimited by '## Task {#}'. See the File Formatting Assumptions section.
If the "text" scope is selected, the program will identify student errors in the written responses of the assignment, comparing them to the solution's rubric for written responses. If the 'text' scope is chosen, then 'pdf' must be chosen for the submission type.
If the "image" scope is selected, the program will identify issues in submission images, optionally comparing them to reference solutions. Question numbers can be specified by adding the tag markus_question_name: <question name> to the metadata for the code cell that generates the submission image. The previous cell's markdown content will be used as the question's context.
Submission Type
The program automatically detects submission type based on file extensions in the assignment directory:
- Files ending with
_submission.ipynb→ jupyter notebook - Files ending with
_submission.py→ python file - Files ending with
_submission.pdf→ PDF document
The user can also explicitly specify the submission type using the --submission_type argument if auto-detection is not suitable.
Currently, jupyter notebook, pdf, and python assignments are supported.
Prompts
The --prompt argument accepts either pre-defined prompt names or custom file paths:
Pre-defined Prompts
To use pre-defined prompts, specify the prompt name (without extension). Pre-defined prompts are stored as markdown (.md) files in the ai_feedback/data/prompts/user/ directory.
Custom Prompt Files
To use custom prompt files, specify the file path to your custom prompt. The file should be a markdown (.md) file.
Prompt files can contain template placeholders with the following structure:
Consider this question:
{context}
{submission_image}
Do the graphs in the attached image solve the problem? Do not include an example solution.
Prompt files are now stored as markdown (.md) files in the ai_feedback/data/prompts/user/ directory. Each prompt can contain template placeholders that will be automatically replaced with relevant content.
Prompt Naming Conventions:
- Prompts to be used when --scope code is selected are prefixed with code_{}.md
- Prompts to be used when --scope image is selected are prefixed with image_{}.md
- Prompts to be used when --scope text is selected are prefixed with text_{}.md
Scope validation (prefix matching) only applies to pre-defined prompts. Custom prompt files can be used with any scope.
All prompts are treated as templates that can contain special placeholder blocks, the following template placeholders are automatically replaced:
{context}- Question context{file_references}- List of files being analyzed with descriptions{file_contents}- Full contents of files with line numbers{submission_image}- Student submission image{solution_image}- Reference solution image
Code Scope Prompts
| Prompt Name | Description |
|---|---|
code_explanation.md |
Outputs paragraph explanation of errors. |
code_hint.md |
Outputs short hints on what errors are. |
code_lines.md |
Outputs only code lines where errors are caused. |
code_table.md |
Outputs a table which shows the question requirement, the student’s attempt, and potential issue. |
code_template.md |
Outputs a template format specified to include error type, description, solution. |
code_annotation.md |
Outputs a json object of a list of annotation objects to display student errors on MarkUs. This is intended for markus integration usage. |
Image Scope Prompts
| Prompt Name | Description |
|---|---|
image_analyze.md |
Outputs whether the submission image answers the question provided by the context. |
image_analyze_annotations.md |
Outputs whether the submission image answers the question provided by the context as a list of JSON objects, each with a description of the issue and a location on the image. Intended for MarkUs integration usage. |
image_compare.md |
Outputs table comparing style elements between submission and solution graphs. |
image_style.md |
Outputs table checking the style elements in a submission graph. |
image_style_annotations.md |
Outputs evaluations of style elements in a submission graph as a list of JSON objects, each with a description of the issue and a location on the image. Intended for MarkUs integration usage. |
Text Scope Prompts
| Prompt Name | Description |
|---|---|
text_pdf_analyze.md |
Outputs whether the submission written response matches all the criteria specified in the solution. |
Prompt_text
Additonally, the user can pass in a string through the --prompt_text argument. This will either be concatenated to the prompt if --prompt is used or fed in as the only prompt if --prompt is not used.
System Prompts
The --system_prompt argument accepts either pre-defined system prompt names or custom file paths:
Pre-defined System Prompts
To use pre-defined system prompts, specify the system prompt name (without extension). Pre-defined system prompts are stored as markdown (.md) files in the ai_feedback/data/prompts/system/ directory.
Custom System Prompt Files
To use custom system prompt files, specify the file path to your custom system prompt. The file should be a markdown (.md) file.
System prompts define the AI model's behavior, tone, and approach to providing feedback. They are used to set the context and personality of the AI assistant.
Marking Instructions
The --marking_instructions argument accepts a file path to a text file containing rubric or marking instructions. If the prompt template contains a {marking_instructions} placeholder, the contents of the file will be inserted at that location in the prompt.
Models
The models used can be seen under the ai_feedback/models folder.
OpenAI Vector Store
- Model Name: gpt-4-turbo
- System Prompt: Behaviour of model is set with INSTRUCTIONS prompt from helpers/constants.py.
- Features:
- Assistant: Uses the OpenAI Assistant Beta Feature, allowing customized model for specific tasks.
- Vector Store: The model creates and manages a vector store for data retrieval.
- Tools Used: Supports file_search for retrieving information from uploaded files.
- Cleanup: Uploaded files and models are deleted after processing, in order to manage API resources.
- OpenAI Assistants Documentation
OpenAI
- Uses the same model as above but doesn't use the vector store functionality. Uploads files as part of the prompt.
Note: If you wish to use OpenAI models, you must specify your API key in an .env file. Create a .env file in your project directory and add your API key:
OPENAI_API_KEY=your_api_key_here
Claude
- Model Name: claude-3.7-sonnet
- System Prompt: Behaviour of model is set with INSTRUCTIONS prompt from helpers/constants.py.
- Claude Documentation
Note: If you wish to use the Claude model, you must specify your API key in an .env file. Create a .env file in your project directory and add your API key:
CLAUDE_API_KEY=your_api_key_here
Ollama
Various models were also tested and run locally on the Teach CS Bigmouth server by using Ollama. Listed below are the models that were used to test out the project:
Code Scope
Models:
- deepSeek-R1:70B Documentation
- codellama:latest Documentation
Image Scope
- llama3.2-vision:90b Documentation
-
- This model only supports at most one image attachment.
- llava:34b Documentation
Output Structure
- When
--output filepathis given, the script will:
- Load the template for the output based on the
--output_template(Options defined in ai_feedback/helpers/arg_options.OutputTemplate) - Format it with the provided arguments and processing results.
- Save it under
filepath
- When the
--outputargument is not given, the prompt used and generated response will be sent to stdout in the format selected by--output_template. - When the
--output_templateargument is not given it will default toresponse_onlywhich is only the response from the model
Question Extraction
Use --question "<section name>" to extract a specific section from a submission. Behavior depends on file type.
Supported inputs
-
PDF: looks up section names in the PDF’s Table of Contents (TOC).
-
Text / Markdown / Code (.txt, .md, .ipynb, .qmd): looks for Markdown-style headings (#, ##, ###, …). For code, write the heading in a comment line (e.g., ### Question 1 in Python). The extractor returns all content that belongs to that heading (up until the next heading at the same or higher level).
Matching is case-insensitive and normalizes smart quotes, dashes, and extra whitespace.
Test Files
- Any subdirectory of /test_submissions can be run locally. More examples can be added to this directory using a similar fashion.
GGR274 Test File Assumptions
Code Scope
To test the program using the GGR274 files, we assume that the test assignment files follow a specific directory structure. Currently, this program has been tested using Homework 5 of the GGR274 class at the University of Toronto.
Directory Structure
Within the test_submissions/ggr274_homework5 directory, mock submissions are contained in a separate subdirectories test_submissions/ggr274_homework5/test#. The following naming convention is used for the files:
Homework_5_solution.ipynb– Instructor-provided solution filestudent_submission.ipynb– Student's submission filetest#_error_output.txt– Error trace file for the corresponding test case
Each test folder contains variations of student_submission.ipynb with different errors.
File Formatting Assumptions
To ensure proper extraction and evaluation of student responses, the following format is assumed for Homework_5_solution.ipynb and student_submission.ipynb:
-
Each task must be clearly delimited using markdown headers in the format:
## Task {#}This allows the program to isolate specific questions when using the
--questionargument, ensuring the model only evaluates errors related to the specified question. -
Each file must start with:
## IntroductionThis section serves as the general assignment instructions and is not included in error evaluation.
Image Scope
Test Files
Mock student submissions are stored in ggr274_homework5/image_test#. The following naming convention is used for the files:
solution.ipynb– Instructor-provided solution filestudent_submission.ipynb– Student's submission file
Notebook Preprocessing
To grade a specific question using the --question argument, add the tag markus_question_name: <question name> to the metadata for the code cell that generates an image to be graded. The previous cell's markdown content will be used as the question's context.
Package Usage
In order to run this package locally:
Ensure you have the environment variables set up (see Models section above).
When you are in a terminal in the repo, run:
pip install -e .
Run the program:
python -m ai_feedback \
--submission_type <file_type> \
--prompt <prompt_name> \
--scope <image|code|text> \
--submission <submission_file_path> \
--solution <solution_file_path> \
--test_output <test_ouput_path> \
--submission_image <image_file_path> \
--solution_image <image_file_path> \
--question <question_number> \
--model <model_name> \
--output <file_path_to> \
--output_template <file_name> \
--system_prompt <prompt_file_path> \
--llama_mode <server|cli>
- See the Arguments section for the different command line argument options, or run this command to see help messages and available choices:
python -m ai_feedback -h
Example Commands
Evaluate cnn_example test using openAI model
python -m ai_feedback --prompt code_lines --scope code --submission test_submissions/cnn_example/cnn_submission --solution test_submissions/cnn_example/cnn_solution.py --model openai
Evaluate cnn_example test using openAI model and custom prompt
python -m ai_feedback --prompt_text "Evaluate the student's code readability." --scope code --submission test_submissions/cnn_example/cnn_submission.py --model openai
Evaluate pdf_example test using openAI model
python -m ai_feedback --prompt text_pdf_analyze --scope text --submission test_submissions/pdf_example/student_pdf_submission.pdf --model openai
Evaluate question1 of test1 of ggr274 homework using DeepSeek model
python -m ai_feedback --prompt code_table \
--scope code --submission test_submissions/ggr274_homework5/test1/student_submission.ipynb --question 1 --model deepSeek-R1:70B
Evaluate the image for question 5b of ggr274 homework with Llama3.2-vision
python -m ai_feedback --prompt image_analyze --scope image --solution ./test_submissions/ggr274_homework5/image_test2/student_submission.ipynb --submission_image test_submissions/ggr274_homework5/image_test2/student_submission.png --question "Question 5b" --model llama3.2-vision:90b
Evaluate the bfs example with remote model to test_file using the verbose template
python -m ai_feedback --prompt code_lines --scope code --solution ./test_submissions/bfs_example/bfs_solution.py --submission test_submissions/bfs_example/bfs_submission.py --model remote --output --output test_file --output_template verbose
Evalute the Jupyter notebook of test1 of ggr274 using DeepSeek-v3 via llama.cpp server
python3 -m ai_feedback --prompt code_table --scope code \
--submission test_submissions/ggr274_homework5/test1/student_submission.ipynb \
--solution test_submissions/ggr274_homework5/test1/Homework_5_solution.ipynb \
--model deepSeek-v3 --llama_mode server
Evalute the Jupyter notebook of test1 of ggr274 using DeepSeek-v3 via llama.cpp cli
python3 -m ai_feedback --prompt code_table --scope code \
--submission test_submissions/ggr274_homework5/test1/student_submission.ipynb \
--solution test_submissions/ggr274_homework5/test1/Homework_5_solution.ipynb \
--model deepSeek-v3 --llama_mode cli
Get annotations for cnn_example test using openAI model
python -m ai_feedback --prompt code_annotations --scope code --submission test_submissions/cnn_example/cnn_submission --solution test_submissions/cnn_example/cnn_solution.py --model openai --json_schema ai_feedback/data/schema/code_annotation_schema.json
Evaluate using custom prompt file path
python -m ai_feedback --prompt ai_feedback/data/prompts/user/code_overall.md --scope code --submission test_submissions/csc108/correct_submission/correct_submission.py --solution test_submissions/csc108/solution.py --model codellama:latest
Evaluate using custom model_options
python3 -m ai_feedback --prompt code_table --scope code --submission ../ai-autograding-feedback-eval/test_submissions/108/hard_coding_submission.py --model openai-vector --submission_type python --model_options "max_tokens=1200,temperature=0.4,top_p=0.92"
Using Ollama
In order to run this project on Bigmouth:
- SSH into teach.cs
ssh username@teach.cs.utoronto.ca
- SSH into bigmouth (access permission required)
ssh bigmouth
- Ensure you're in the project directory
- Start Ollama
ollama start
- Ensure models specified in repo are downloaded
ollama list
- Run the script according to the Package Usage section above.
Markus Integration
This python package can be used as a dependency in the Markus Autotester, in order to display LLM generated feedback as overall comments and test outputs, and as annotations on the submission file. Following the instructions below to set up the Autotester, once 'Run Tests' is pressed, these comments and annotations should appear automatically on the Markus UI.
Markus Test Scripts
- /markus_test_scripts contains scripts which can be uploaded to the autotester in order to generate LLM Feedback
- Currently, only openAI and Claude models are supported.
- Within these llm script files, the models and prompts used can be changed by editing the command line arguments, through the run_llm() function.
Files:
- python_tester_llm_code.py: Runs LLM on any code assignment (solution file, submission file) uploaded to the autotester. First, creates general feedback and displays as overall comments and test output (can use any prompt and model). Second, feeds in the output of the first LLM response into the model again, asking it to create annotations for the student's mistakes. (Ensure to change submission file import name.)
- llm_helpers.py: contains helper functions needed to run llm scripts.
- python_tester_llm_pdf.py: Runs LLM on any pdf assignment (solution file and submission file) uploaded to the autotester. Creates general feedback about whether the student's written responses matches the instructors feedback. Dislayed in test outputs and overall comments.
- custom_tester_llm_code.sh: Runs LLM on assignments (solution file, submission file, test output file) uploaded to the custom autotester. Currently, supports jupyter notebook files uploaded. Can specify prompt and model used in the script. Displays in overall comments and in test outputs. Can optionally uncomment the annotations section to display annotations, however the annotations will display on the .txt version of the file uploaded by the student, not the .ipynb file.
Python AutoTester Usage
Code Scope
- Ensure the student has submitted a submission file.
- Ensure the instructor has submitted a solution file, llm_helpers.py (located in /markus_test_scripts), and python_tester_llm_code.py (located in /markus_test_scripts). Instructor can also upload another pytest file which can be run as its own test group.
- Ensure the submission import statement in python_tester_llm_code.py matches the name of the student's submission file name.
- Create a Python Autotester Test Group to run the LLM File.
- In the Package Requirements section of the Test Group Settings for the LLM file, put:
git+https://github.com/MarkUsProject/ai-autograding-feedback.git#egg=ai_feedback
Along with any other packages that the submission or solution file uses.
- Ensure the Timeout is set to 120 seconds or longer.
- Ensure Markus Autotester docker container has the API Keys in an .env file and specified in the docker compose file.
Text Scope
- Do the same as the code scope, but ensure that the student submission and instructor solution are .pdf files with the same naming assumption. Also, ensure that python_tester_llm_pdf.py is uploaded as the test script.
AI Tester Usage
- In the Autotest settings of the assignment, click Add Tester and select the
aioption. - Fill in all required arguments for the AI tester.
- Upload any related files (e.g., JSON schema files, custom prompts, or configuration files).
- Ensure the MarkUs Autotester Docker container has the API keys defined in an .env file and that these variables are specified in the docker-compose.yml file.
- Ensure the Timeout is set to 120 seconds or longer.
Running Python Autotester Examples
CNN Example
- Look at the /test_submissions/cnn_example directory for the following files
- Instructor uploads: cnn_solution.py, cnn_test.py, llm_helpers.py, python_tester_llm_code.py files
- Separate test groups for cnn_test.py and python_tester_llm_code.py
- cnn_test.py Autotester package requirements: torch numpy
- python_tester_llm_code.py Autotester package requirements: git+https://github.com/MarkUsProject/ai-autograding-feedback.git#egg=ai_feedback numpy torch
- Student uploads: cnn_submission.pdf
BFS Example
- Look at the /test_submissions/bfs_example directory for the following files
- Instructor uploads: bfs_solution.py, test_bfs.py, llm_helpers.py, python_tester_llm_code.py files
- Separate test groups for test_bfs.py and python_tester_llm_code.py
- python_tester_llm_code.py Autotester package requirements: git+https://github.com/MarkUsProject/ai-autograding-feedback.git#egg=ai_feedback
- Student uploads: bfs_submission.pdf
PDF Example
- Look at the /test_submissions/pdf_example directory for the following files
- Instructor uploads: instructor_pdf_solution.pdf, llm_helpers.py, python_tester_llm_pdf.py files
- Autotester package requirements: git+https://github.com/MarkUsProject/ai-autograding-feedback.git#egg=ai_feedback
- Student uploads: student_pdf_submission.pdf
Custom Tester Usage
- Ensure the student has submitted a submission file.
- Ensure the instructor has submitted a solution file and custom_tester_llm_code.sh (located in /markus_test_scripts). Instructor can also upload another script used to run its own test group. (See below for GGR274 Example.)
- In the Markus Autotesting terminal:
docker exec -it -u 0 markus-autotesting-server-1 /bin/bash
Then as the root user, install the package:
/home/docker/.autotesting/scripts/defaultvenv/bin/pip install git+https://github.com/MarkUsProject/ai-autograding-feedback.git#egg=ai_feedback
Also pip install other packages that the submission or solution file uses.
- Create a Custom Autotester Test Group to run the LLM script file.
- Ensure the Timeout is set to 120 seconds or longer.
- Ensure Markus Autotester docker container has the API Keys in an .env file and specified in the docker compose file.
GGR274 Test1 Example
- Look at the /test_submissions/ggr274_hw5_custom_tester directory for the following files
- Instructor uploads: Homework_5_solution.ipynb, test_hw5.py, test_output.txt, custom_tester_llm_code.sh, run_hw5_test.sh
- Two separate test groups: one for run_hw5_test.sh, and one for custom_tester_llm_code.sh
- Student uploads: test1_submission.ipynb, test1_submission.txt
NOTE: if the LLM Test Group appears to be blank/does not turn green, try increasing the timeout.
Custom Tester
- custom_tester_llm_code.sh: Runs LLM on any assignment (solution file, submission file, test output file) uploaded to the autotester. Can specify prompt and model used in the script. Displays in overall comments and in test outputs.
Developers
To install project dependencies, including development dependencies:
$ pip install -e .[dev]
To install pre-commit hooks:
$ pre-commit install
To run the test suite:
$ pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markus_ai_feedback-0.1.0.tar.gz.
File metadata
- Download URL: markus_ai_feedback-0.1.0.tar.gz
- Upload date:
- Size: 43.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42804f5ce7135ed1b01dfe0f3693f90a079ba4d90e62a87f4a6cc6119a1ec0c3
|
|
| MD5 |
a07b403d810677960381f083d1fc07b4
|
|
| BLAKE2b-256 |
d91779e921417aac3c0f01dcb83846e4ea5d4a96ba5277c8110867b6294cd4fc
|
File details
Details for the file markus_ai_feedback-0.1.0-py3-none-any.whl.
File metadata
- Download URL: markus_ai_feedback-0.1.0-py3-none-any.whl
- Upload date:
- Size: 94.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33dd4669679754928f5018c56aa6b6c59614790ecfac0bdae701ed68bc161b2d
|
|
| MD5 |
9c011b60646aae14626095784975e360
|
|
| BLAKE2b-256 |
efba7b0fea67e7a16a4ecdb02ca8a23110ed1f2d806ca93d93424f626dfa6922
|