CodEval for evaluating programming assignments

Project description

CodEval

Currently CodEval has 3 main components:

1. Test Simple I/O Programming Assignments on Canvas

codeval.ini contents

[SERVER]
url=<canvas API>
token=<canvas token>
[RUN]
precommand=
command=

Refer to a sample codeval.ini file here

Command to run:

python3 codeval.py grade-submissions <a unique part of course name> [FLAGS] Example:
If the course name on Canvas is CS 149 - Operating Systems, the command can be:
python3 codeval.py CS\ 149
or
python3 codeval.py "Operating Systems"
Use a part of the course name that can uniquely identify the course on Canvas.

Flags

--dry-run/--no-dry-run (Optional)
- Default: --dry-run
- Do not update the results on Canvas. Print the results to the terminal instead.
--verbose/--no-verbose (Optional)
- Default: --no-verbose
- Show detailed logs
--force/--no-force (Optional)
- Default: --no-force
- Grade submissions even if already graded
--copytmpdir/--no-copytmpdir (Optional)
- Default: --no-copytmpdir
- Copy temporary directory content to current directory for debugging

Codeval File Matching

When looking up the codeval file for an assignment, CodEval first tries to match by filename (case-insensitive). If no filename match is found, it falls back to checking the CRT_HW START title inside each .codeval file in the codeval directory. This allows the codeval filename to differ from the assignment name on Canvas as long as the CRT_HW START title matches.

Specification Tags

Tags used in a spec file (<course name>.codeval)

Tag	Meaning	Function
C	Compile Code	Specifies the command to compile the submission code
CTO	Compile Timeout	Timeout in seconds for the compile command to run
RUN	Run Script	Specifies the script to use to evaluate the specification file. Defaults to evaluate.sh.
Z	Download Zip	Will be followed by zip files to download from Canvas to use when running the test cases.
CF	Check Function	`CF <function_name> [filename]` — checks that the function is used. The filename is optional and omitting it (e.g. `CF strtol`) is the preferred usage; source files are inferred from the most recent `C` tag and the check uses compiled-artifact inspection only (objdump/javap/ast). Providing a filename (e.g. `CF strtol mycalc.c`) is supported for backwards compatibility and additionally enables a regex fallback when no compiled artifact is found.
NCF	Check Not Function	`NCF <function_name> [filename]` — checks that the function is not used. The filename is optional (same behaviour as CF).
CC	Check Container	Will be followed by a function name and a list of files to check to ensure that a container is used by one of those files. Primarily supports C++ containers such as std::vector
CO	Check Object	Will be followed by a function name and a list of files to check to ensure that an object is used by one of those files. Primarily support C++ stream operations
PRINT	Print Label	Prints a label/message to stdout. Cleaner alternative to `CMD echo "..."` for section labels.
CMD/TCMD	Run Command	Will be followed by a command to run. The TCMD will cause the evaluation to fail if the command exits with an error.
CMP	Compare	Will be followed by two files to compare.
T/HT	Test Case	Will be followed by the command to run to test the submission.
I/IB/IF	Supply Input	Specifies the input for a test case. I adds a newline, IB does not add a newline, IF reads from a file.
O/OB/OF	Check Output	Specifies the expected output for a test case. O adds a newline, OB does not add a newline, OF reads from a file.
E/EB	Check Error	Specifies the expected error output for a test case. E adds a newline, EB does not.
TO	Timeout	Specifies the time limit in seconds for a test case to run. Defaults to 20 seconds.
X	Exit Code	Specifies the expected exit code for a test case. Defaults to zero.
SS	Start Server	Command containing timeout (wait until server starts), kill timeout (wait to kill the server), and the command to start a server
TEMP	Temp File	Registers a file to be deleted before the next T, HT, or TCMD test runs (clean state) and again after it completes (cleanup). Only applies to the immediately following test — use a new TEMP tag for each test that needs it.

Refer to a sample spec file here

2. Test Distributed Programming Assignments

(or complex non I/O programs)

codeval.ini contents

[SERVER]
url=<canvas API>
token=<canvas token>
[RUN]
precommand=
command=
dist_command=
host_ip=
[MONGO]
url=
db=

Refer to a sample codeval.ini file here

Command to run

is the same as the command in #1:
python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]

Distributed Specification Tags

Tag	Meaning	Function
--DT--	Distributed Tests Begin	Marks the beginning of distributed tests. Is used to determine if the spec file has distributed tests
GTO	Global timeout	A total timeout for all distributed tests, for each of homogenous and heterogenous tests. Homogenous tests = GTO value. Heterogenous tests = 2 * GTO value
PORTS	Exposed ports count	Maximum number of ports needed to expose per docker container
ECMD/ECMDT SYNC/ASYNC	External Command	Command that runs in the a controller container, emulating a host machine. ECMDT: Evaluation fails if command returns an error. SYNC: CodEval waits for command to execute or fail. ASYNC: CodEval doesn't wait for command to execute, failure is checked if ECMDT
DTC $int [HOM] [HET]	Distributed Test Config Group	Signifies the start of a new group of Distributed tests. Replace $int with the number of containers that needs to be started for the test group. HOM denotes homogenous tests, i.e., user's own submissions will be executed in the contianers. HET denotes heterogenous tests, i.e., a combination of $int - 1 other users' and current user's submissions will be executed in the containers. Can enter either HOM or HET or both
ICMD/ICMDT SYNC/ASYNC */n1,n2,n3...	Internal Command	Command that runs in each of the containers. ICMDT: Evaluation fails if command returns an error. SYNC: wait for command to execute or fail. ASYNC: Don't wait for command to execute, failure is checked if ICMDT *: run command in all the containers. n1,n2..nx: Run command in containers indexed n1,n2..nx only. Containers follow zero-based indexing
TESTCMD	Test Command	Command run on the host machine to validate the submission(s)
--DTCLEAN--	Cleanup Commands	Commands to execute after the tests have completed or failed. Can contain only ECMD or ECMDT

Special placeholders in commands

Placeholder	Usage
TEMP_DIR	used in ECMD/ECMDT to be replaced by the temporary directory generated by CodEval during execution
HOST_IP	used in ECMD/ECMDT/ICMD/ICMDT to be replaced by the host's IP specified in codeval.ini
USERNAME	used in ICMD/ICMDT to be replaced by the user's username whose submission is being evaluated
PORT_$int	used in ICMD/ICMDT to be replaced by a port number assigned to the running docker continer. $int needs to be < PORT value in the specification

Refer to a sample spec file here

Notes

The config file codeval.ini needs to contain the extra entries only if the tag --DT-- exists in the specification file
Distributed tests need a running mongodb service to persists the progress of students running heterogenous tests

3. Test SQL Assignments

codeval.ini contents

[SERVER]
url=<canvas API>
token=<canvas token>
[RUN]
precommand=
command=
dist_command=
host_ip=
sql_command=

Refer to a sample codeval.ini file here

Command to run

is the same as the command in #1:
python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]

SQL Specification Tags

Tag	Meaning	Function
--SQL--	SQL Tests Begin	Marks the beginning of SQL tests. Is used to determine if the spec file has SQL based tests
INSERT	Insert rows in DB	Insert rows in the SQL database using files/ individual insert queries.
CONDITIONPRESENT	Check condition in file	Validate submission files for a required condition to be present in submissions.
SCHEMACHECK	External Command	Validate submission files for database related checks like constraints.
TSQL	SQL Test	Marks the SQL test, take input as a file or individual query and run it on submission files.

Refer to a sample spec file here

Notes

The config file codeval.ini needs to contain the extra entries only if the tag --SQL-- exists in the specification file
SQL tests need a separate container image to run SQL tests in MYSQL.

Create an assignment on Canvas

Command to create the assignment:

Syntax: python3 codeval.py create-assignment <course_name> <specification_file> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --group_name ]
Example: python3 codeval.py create-assignment "Practice1" 'a_big_bag_of_strings.txt' --no-dry-run --verbose --group_name "exam 2"

Command to grade the assignment:

Syntax: python3 codeval.py grade-submissions <course_name> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --force/--no-force][--copytmpdir/--no-copytmpdir]
Example: python3 codeval.py grade-submissions "Practice1" --no-dry-run --force --verbose

Assignment description tags

CRT_HW START <Assignment_name> - usually at the beginning of the file. Then lines that follow this tag are the assignment description in markdown.
CRT_HW END - ends the assignment description

Assignment description macros

DISCSN_URL - this macro will be substituted with the URL of the discussion that was created for this assignment
EXMPLS <no_of_test_cases> - this macro will be replaced with the specified number of test cases formatted for display
FILE[file_name] - this macro will be replaced by a link to the specified file
COMPILE - this macro will be replaced with the compile command from the C tag in the specification file

MODIFICATIONS REQUIRED IN THE SPECIFICATION FILE.

Start the specification file with the tag CRT_HW START followed by a space followed by the name of assignment. For ex: CRT_HW START Hello World
The following lines after the first line will contain the description of the assignment in Markdown format.
The description ends with the last line containing just the tag CRT_HW END . For ex: CRT_HW END
After this tag, the content for grading the submission begins.

Addition of the Discussion Topic in the assignment description.
1. Insert the tag DISCUSSION_LINK wherever you want the corresponding discussion topic's link to appear. For ex: To access the discussion topic for this assignment you go here DISCUSSION_LINK
Addition of sample examples in the assignment description.
1. Insert the tag EXMPLS followed by single space followed by the value. Here value is the number of test cases to be displayed as sample examples. At maximum it will print all the non hidden test cases. For ex: EXMPLS 5
Addition of the links to the files uploaded in the Codeval folder in the assignment description.
1. In order to add hyperlink to a file the markdown format is as follows: file_name_to_be_displayed Here in the parenthesis where the Url is required,insert the tag FILE[name of file]. For ex: FILE[file_name.extension] If the file is not already in the Codeval folder, it will be extracted from a zip file in the CodEval spec and uploaded automatically.

UPLOAD THE REQUIRED FILES IN CODEVAL FOLDER IN FILES SECTION.

Create a folder called assignmentFiles which should contain all the necessary files including the specification file.

EXAMPLE OF THE SPECIFICATION FILE.

CRT_HW START Bag Of Strings
# Description
## Problem Statement
- This Is An Example For The Description Of The Assignment In Markdown.
- To Download The File [Hello_World](URL_OF_HW "Helloworld.Txt")

## Sample Examples
EXMPLS 3

## Discussion Topic
Here Is The Link To The Discussion Topic: DISCSN_URL

### Rubric 
| Cases | Points|
| ----- |----- |
| Base Points | 50 |

CRT_HW END  

C cc -o bigbag --std=gnu11 bigbag.c

4. Test Assignments with AI Models

Test programming assignments against multiple AI models (Claude, GPT, Gemini) to benchmark their performance.

Installation

Install the AI provider packages you want to use:

# Install all AI providers
pip install assignment-codeval[ai]

# Or install specific providers
pip install anthropic        # For Claude models
pip install openai           # For GPT models
pip install google-generativeai  # For Gemini models

codeval.ini contents (optional)

[AI]
anthropic_key=sk-ant-...
openai_key=sk-...
google_key=...

API keys can also be provided via:

Environment variables: ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY
Command line options: --anthropic-key, --openai-key, --google-key

Command to run

assignment-codeval test-with-ai <codeval_file> [OPTIONS]

Options

Option	Description
`-o, --output-dir`	Directory to store solutions and results (default: `ai_test_results`)
`-n, --attempts`	Number of attempts per model (default: 1)
`-m, --models`	Specific models to test (can be used multiple times)
`-p, --providers`	Only test models from specific providers: `anthropic`, `openai`, `google`
`--anthropic-key`	Anthropic API key
`--openai-key`	OpenAI API key
`--google-key`	Google API key

Examples

# Test with all Anthropic models
assignment-codeval test-with-ai my_assignment.codeval -p anthropic

# Test with specific model, 3 attempts each
assignment-codeval test-with-ai my_assignment.codeval -m "Claude Sonnet 4" -n 3

# Test with all providers (requires all API keys)
assignment-codeval test-with-ai my_assignment.codeval -n 2

# Pass API key directly
assignment-codeval test-with-ai my_assignment.codeval --anthropic-key sk-ant-xxx -p anthropic

Supported Models

Provider	Models
Anthropic	Claude Sonnet 4, Claude Opus 4
OpenAI	GPT-4o, GPT-4o Mini, o1, o3-mini
Google	Gemini 2.0 Flash, Gemini 1.5 Pro

Note: You can add additional models using -m "model-id". Check each provider's documentation for available model IDs.

Output Structure

ai_test_results/
├── prompt.txt                    # The prompt sent to AI models
├── results.json                  # Summary of all results
├── Claude_Sonnet_4/
│   └── attempt_1/
│       ├── raw_response.txt      # Raw AI response
│       ├── solution.c            # Extracted code
│       └── <codeval files>       # Copied for evaluation
├── GPT-4o/
│   └── attempt_1/
│       └── ...
└── ...

Notes

The command extracts the assignment description from the codeval file (between CRT_HW START and CRT_HW END tags)
Support files from support_files/ directory are automatically copied for evaluation
Results include pass/fail status, response time, and any errors
Use multiple attempts (-n) to account for AI response variability

Project details

Release history Release notifications | RSS feed

This version

0.0.30

Apr 27, 2026

0.0.29

Mar 8, 2026

0.0.28

Mar 6, 2026

0.0.27

Mar 5, 2026

0.0.26

Mar 5, 2026

0.0.25

Mar 5, 2026

0.0.24

Feb 27, 2026

0.0.23

Feb 27, 2026

0.0.22

Feb 16, 2026

0.0.21

Feb 3, 2026

0.0.20

Feb 3, 2026

0.0.19

Jan 31, 2026

0.0.18

Jan 31, 2026

0.0.17

Jan 31, 2026

0.0.16

Jan 31, 2026

0.0.15

Jan 31, 2026

0.0.14

Jan 28, 2026

0.0.13

Jan 23, 2026

0.0.12

Jan 22, 2026

0.0.10

Jan 19, 2026

0.0.9

Jan 19, 2026

0.0.8

Jan 16, 2026

0.0.7

Jan 15, 2026

0.0.6

Jan 15, 2026

0.0.5

Sep 21, 2025

0.0.4

Sep 7, 2025

0.0.3

Sep 4, 2025

0.0.2

Aug 28, 2025

0.0.1

Aug 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assignment_codeval-0.0.30.tar.gz (63.3 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

assignment_codeval-0.0.30-py3-none-any.whl (51.4 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file assignment_codeval-0.0.30.tar.gz.

File metadata

Download URL: assignment_codeval-0.0.30.tar.gz
Upload date: Apr 27, 2026
Size: 63.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for assignment_codeval-0.0.30.tar.gz
Algorithm	Hash digest
SHA256	`7e7696dec7bb21d2295a41435053090ecda5cb31c07f6eaa8016502e5534c979`
MD5	`e7896113c1ab578fe4c0530529db72bf`
BLAKE2b-256	`32cc17d4196eb4ce460220e13a0c5b01e5e63ef95f46f049cd840233b922da5a`

See more details on using hashes here.

File details

Details for the file assignment_codeval-0.0.30-py3-none-any.whl.

File metadata

Download URL: assignment_codeval-0.0.30-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 51.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for assignment_codeval-0.0.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a646357aa197a9eef62df094cb51aab3dbda154fe204813c010fe782fe7a181`
MD5	`416d3a303dbcdc270dc81433cf6e1b16`
BLAKE2b-256	`8302a8addb13248f470f4b6aa8116c1d225584324a72044bac40c0d5383bf6c8`

See more details on using hashes here.

assignment-codeval 0.0.30

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CodEval

1. Test Simple I/O Programming Assignments on Canvas

codeval.ini contents

Command to run:

Flags

Codeval File Matching

Specification Tags

2. Test Distributed Programming Assignments

(or complex non I/O programs)

codeval.ini contents

Command to run

Distributed Specification Tags

Special placeholders in commands

Notes

3. Test SQL Assignments

codeval.ini contents

Command to run

SQL Specification Tags

Notes

Create an assignment on Canvas

Command to create the assignment:

Command to grade the assignment:

Assignment description tags

Assignment description macros

MODIFICATIONS REQUIRED IN THE SPECIFICATION FILE.

Addition of sample examples in the assignment description.

Addition of the links to the files uploaded in the Codeval folder in the assignment description.

UPLOAD THE REQUIRED FILES IN CODEVAL FOLDER IN FILES SECTION.

EXAMPLE OF THE SPECIFICATION FILE.

4. Test Assignments with AI Models

Installation

codeval.ini contents (optional)

Command to run

Options

Examples

Supported Models

Output Structure

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes