Skip to main content

LLM-powered Jupyter notebook grading tool with nbgrader compatibility

Project description

sglnbgrader: LLM-Assisted Jupyter Notebook Grader

This project provides an automated grading system for Jupyter notebooks that uses Large Language Models (LLMs) to assess student answers against instructor-provided reference solutions. It's compatible with the nbgrader metadata format.

Features

  • Grade notebooks with nbgrader metadata
  • Compare student answers to reference solutions using LLMs
  • Generate detailed feedback and add it directly to notebook cells
  • Provide comprehensive scoring and analysis
  • Support for both single notebook and batch grading
  • Analyze consistency and fairness across multiple submissions
  • Export grading results to JSON or HTML-enhanced notebooks
  • Customizable LLM prompts and grading criteria

Installation

Prerequisites

  • Python 3.12 or higher
  • An OpenAI API key for access to GPT-4 models (or other models via LiteLLM)

Installation Options

Option 1: Install as a standalone tool with uv (recommended)

The fastest and easiest way to install is using uv tool install:

# Install directly as a standalone tool
uv tool install sglnbgrader

# This makes the command available globally without activating any environment

Option 2: Install from PyPI

# Using pip
pip install sglnbgrader

# Using uv pip
uv pip install sglnbgrader

Option 3: Install from source

  1. Clone the repository:

    git clone https://github.com/yourusername/sglnbgrader.git
    cd sglnbgrader
    
  2. Install the package:

    # Using pip
    pip install -e .
    
    # Using uv
    uv pip install -e .
    

API Key Setup

Set up your OpenAI API key as an environment variable:

# Linux/macOS
export OPENAI_API_KEY=your_api_key_here

# Windows
set OPENAI_API_KEY=your_api_key_here

# Or add to your .bashrc or .zshrc for persistence
echo 'export OPENAI_API_KEY=your_api_key_here' >> ~/.bashrc

Usage

Command-line Interface

The grader provides a command-line interface with two main commands:

Grade a Single Notebook

sglnbgrader single --answer path/to/answer_notebook.ipynb --student path/to/student_notebook.ipynb --output results.json --verbose

Or using the Python module:

python -m sglnbgrader single --answer path/to/answer_notebook.ipynb --student path/to/student_notebook.ipynb --output results.json --verbose

Options:

  • --answer: Path to instructor's answer notebook (required)
  • --student: Path to student notebook (required)
  • --model: LLM model to use for grading (default: gpt-4.1-nano)
  • --output: Path to save grading results as JSON (optional)
  • --verbose, -v: Show detailed grading information

Grade Multiple Notebooks

sglnbgrader batch --answer path/to/answer_notebook.ipynb --submissions path/to/submissions_dir --output path/to/results_dir --verbose

Or using the Python module:

python -m sglnbgrader batch --answer path/to/answer_notebook.ipynb --submissions path/to/submissions_dir --output path/to/results_dir --verbose

Options:

  • --answer: Path to instructor's answer notebook (required)
  • --submissions: Directory containing student submissions (required)
  • --model: LLM model to use for grading (default: gpt-4.1-nano)
  • --output: Directory to save grading results (optional)
  • --verbose, -v: Show detailed grading information

API Usage

from sglnbgrader import Grader

# Initialize the grader with the instructor's answer notebook
grader = Grader("path/to/answer_notebook.ipynb", model="gpt-4.1-nano")

# Grade a single student notebook
results = grader.grade_user_notebook("path/to/student_notebook.ipynb")

# Print the results
print(f"Total score: {results['total_score']}/{results['max_score']} ({results['percentage']}%)")

# Access individual question results
for result in results["results"]:
    print(f"Question {result['grade_id']}: {result['score']}/{result['max_score']}")
    print(f"Feedback: {result['feedback']}")
    
# Generate feedback in the notebook
feedback_notebook_path = grader.write_feedback_to_notebook(
    "path/to/student_notebook.ipynb", results
)
print(f"Feedback notebook created at: {feedback_notebook_path}")

# Compare multiple submissions
submission_paths = [
    "path/to/student1_notebook.ipynb",
    "path/to/student2_notebook.ipynb",
    "path/to/student3_notebook.ipynb",
]
comparison_results = grader.compare_student_submissions(submission_paths)

# Run benchmarks on the grading system
benchmark_results = grader.run_benchmarks(results, submission_paths)

Notebook Format Requirements

This grader works with notebooks that use the nbgrader metadata format:

  • Cells that should be graded must have nbgrader metadata
  • Required metadata fields: grade_id, grade: true, points

Example cell metadata:

{
  "metadata": {
    "nbgrader": {
      "grade": true,
      "grade_id": "question-1",
      "points": 10,
      "solution": true
    }
  }
}

Advanced Features

Writing Feedback to Notebooks

The write_feedback_to_notebook method adds HTML-formatted feedback directly into the notebook cell outputs:

feedback_notebook_path = grader.write_feedback_to_notebook(
    "path/to/student_notebook.ipynb", results
)

This creates a new notebook with:

  • HTML feedback boxes in each graded cell
  • A summary cell at the end with total score and breakdown
  • Preserves all original content

Comparing Student Submissions

The compare_student_submissions method analyzes results across multiple submissions:

comparison_results = grader.compare_student_submissions([
    "path/to/student1_notebook.ipynb",
    "path/to/student2_notebook.ipynb",
])

This provides:

  • Statistics for each question (mean, median, standard deviation)
  • Overall class performance metrics
  • Consistency measures between different submissions

Benchmarking Grading Quality

The run_benchmarks method validates the grading system's consistency and fairness:

benchmark_results = grader.run_benchmarks(reference_results, submission_paths)

This analyzes:

  • Consistency relative to reference results
  • Fairness of scoring across different questions
  • Performance metrics for the grading system

Configuration

You can customize the LLM model and prompt by extending the Grader class or modifying the prompt property:

class CustomGrader(Grader):
    @property
    def prompt(self):
        return """
        Your custom prompt template here.
        Question: {question}
        Reference Answer: {reference_answer}
        Student Answer: {student_answer}
        Points: {points}
        """

Development

Testing

Run the tests using pytest:

# Using pytest directly
pytest

# Using uv
uv run pytest

Project Structure

  • sglnbgrader/ - Main package
    • __init__.py - Package exports
    • grader.py - Core Grader class implementation
    • cli.py - Command-line interface
    • __main__.py - Entry point for running as module
  • tests/ - Test suite

License

MIT

Acknowledgements

  • Uses nbgrader metadata format for identifying graded cells
  • Powered by OpenAI and other LLM providers through LiteLLM
  • CLI built with Typer and Rich for beautiful console output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sglnbgrader-0.1.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sglnbgrader-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file sglnbgrader-0.1.0.tar.gz.

File metadata

  • Download URL: sglnbgrader-0.1.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for sglnbgrader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1da5dd81029e22cee740211f5044fa3348ef24ff101672d3ebaded299a6e964
MD5 9f3e356bdb4c2f1a8918365c75413f20
BLAKE2b-256 213ffe773fb80a27e4220694cb4fa07e11321552e323eb809091b6f1395d4e72

See more details on using hashes here.

File details

Details for the file sglnbgrader-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sglnbgrader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 827480b28e2a903d4a16383bc13ae3bff30e2ee963e4c8769dc596f01f516e0a
MD5 8a790609f490e2166ada7ec4f905bc5c
BLAKE2b-256 8df23b2a423983fcd8eea19a2e92b83a350132cf52d69a4c131ee91310c4695f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page