Skip to main content

Shared core for AI Teaching Assistant (AITA) chatbots

Project description

aita-core

PyPI version

Shared core package for AI Teaching Assistant (AITA) chatbots. Provides the Streamlit UI, RAG pipeline, database logging, admin panel, and document ingestion utilities — parameterized by a CourseConfig so the same codebase serves multiple courses.

How It Works

AITA is an AI chatbot that helps students learn course material. It uses Retrieval-Augmented Generation (RAG) — your course documents (slides, handouts, homework) are indexed into a vector database, and when a student asks a question, the system retrieves relevant content and generates a pedagogically appropriate response using an LLM.

Key design principle: the chatbot never gives direct answers. It guides students through Socratic questioning, hints, and conceptual explanations.

Step-by-Step Setup Guide

Prerequisites

  • Python 3.11+
  • Docker (for deployment)
  • An OpenAI API key
  • (Optional) Google Cloud OAuth credentials for UMN login

Step 1: Create Your Course Repository

Create a new directory for your course:

mkdir AITA_XXXX
cd AITA_XXXX
git init

Step 2: Install aita-core

pip install aita-core

Or add to requirements.txt:

aita-core>=0.1.0

Step 3: Set Up Environment Variables

Create a .env file (never commit this):

OPENAI_API_KEY=sk-your-openai-api-key
ADMIN_PASSWORD=your-admin-password
GOOGLE_COOKIE_KEY=your-random-secret-string
GOOGLE_REDIRECT_URI=http://your-server:8501
AITA_DATA_DIR=/app/data

Step 4: Add Your Course Materials

Create a course_materials/ directory and organize your files:

course_materials/
├── Handouts/
│   └── Handouts/
│       ├── 1 Topic Name.pdf
│       ├── 2 Topic Name.pdf
│       └── ...
├── Homework handouts/
│   └── Homework handouts/
│       ├── HW1.pdf
│       ├── HW2.pdf
│       └── ...
├── Slides/
│   └── Slides/
│       ├── 1 Topic Name/
│       │   ├── content.tex    (or Notes.pdf)
│       │   └── Handout.pdf
│       └── ...
└── syllabus/
    └── Syllabus.pdf (or Syllabus.tex)

Important: Do NOT include homework solutions — the chatbot could leak them to students.

Step 5: Create config.py

This is where you define everything specific to your course. Copy the template below and fill in your course details:

import os
import sys
import glob
from dotenv import load_dotenv
from aita_core import CourseConfig

load_dotenv()

BASE_DIR = os.path.dirname(__file__)
_client_secret_matches = glob.glob(os.path.join(BASE_DIR, "client_secret*.json"))

# Google Auth requires all three: client_secret file + GOOGLE_COOKIE_KEY + GOOGLE_REDIRECT_URI
_google_cookie_key = os.getenv("GOOGLE_COOKIE_KEY")
_google_redirect_uri = os.getenv("GOOGLE_REDIRECT_URI")
if _client_secret_matches and _google_cookie_key and _google_redirect_uri:
    _google_client_secret = _client_secret_matches[0]
else:
    _google_client_secret = ""
    if _client_secret_matches:
        print("[WARN] Google OAuth: client_secret found but GOOGLE_COOKIE_KEY or "
              "GOOGLE_REDIRECT_URI not set. Falling back to student ID login.",
              file=sys.stderr)

SYSTEM_PROMPT = """\
You are an AI Teaching Assistant for COURSE_NAME \
at the University of Minnesota, taught by Prof. YOUR NAME.

YOUR CORE PRINCIPLE: You must NEVER give direct answers to homework or exam problems. \
Instead, you should:
- Ask Socratic questions to guide students toward understanding
- Provide hints and point students to relevant concepts or course materials
- Explain underlying principles without solving the specific problem
- Encourage students to attempt the problem first and share their reasoning
- When students share their work, help them identify errors conceptually

When responding:
- Cite source material when referencing course content
- Be encouraging, patient, and supportive
- Keep responses focused and concise
- If the question is not related to the course, politely redirect
- Use LaTeX for math: $inline$ and $$display$$
- IMPORTANT: Never use \\[ \\] or \\( \\) for LaTeX. Always use $...$ and $$...$$

You will be provided with relevant context from course materials to ground your responses.\
"""

CONFIG = CourseConfig(
    # --- Course identity ---
    course_id="XXXX",
    course_name="CEGE XXXX: AI Teaching Assistant",
    course_short_name="CEGE XXXX AITA",
    course_description=(
        "Welcome! This AI assistant helps you learn concepts for "
        "**CEGE XXXX: Your Course Title**."
    ),
    system_prompt=SYSTEM_PROMPT,

    # --- Week-to-topic mapping ---
    # What topics are covered each week? Used to prevent the chatbot
    # from discussing future topics before they're taught.
    week_topics={
        1:  ["Topic A"],
        2:  ["Topic B", "Topic C"],
        3:  ["Topic D"],
        # ... add all 15 weeks
        15: ["Final exam review"],
    },

    # --- Document-to-week mapping ---
    # Maps the number prefix in filenames (e.g., "3 Topic Name.pdf" -> topic 3)
    # to the week that topic is first covered.
    topic_num_to_week={
        1: 1, 2: 2, 3: 3,
        # ... one entry per handout/slide topic folder
    },

    # Maps homework number to the week it's assigned.
    hw_num_to_week={
        1: 2, 2: 3, 3: 4,
        # ... one entry per homework
    },

    # Maps lab number to week.
    lab_num_to_week={
        1: 1, 2: 2, 3: 3,
        # ... one entry per lab
    },

    # Maps study guide / quiz names to week. Leave empty if not applicable.
    study_guide_to_week={},

    # --- Example prompts ---
    # Shown as clickable buttons when chat is empty. 4 per week works well.
    example_prompts={
        1: [
            "What topics does this course cover?",
            "What are the prerequisites?",
            "How is grading structured?",
            "Help me with this week's homework",
        ],
        # ... add for each week
    },

    # --- Paths ---
    base_dir=BASE_DIR,
    course_materials_dir=os.path.join(BASE_DIR, "course_materials"),
    faiss_db_dir=os.path.join(BASE_DIR, "faiss_db"),
    docs_dir=os.path.join(BASE_DIR, "docs"),
    backup_dir=os.path.join(BASE_DIR, "backup"),
    data_dir=os.getenv("AITA_DATA_DIR", os.path.join(BASE_DIR, "data")),

    # --- Auth ---
    admin_password=os.getenv("ADMIN_PASSWORD", ""),
    cookie_name="aita_XXXX_auth",
    cookie_key=_google_cookie_key or "",
    redirect_uri=_google_redirect_uri or "http://localhost:8501",
    google_client_secret_file=_google_client_secret,
)

Step 6: Create main.py

from config import CONFIG
from aita_core import run

run(CONFIG)

Step 7: Create add_document.py

If your course materials follow the standard directory layout (see Step 4), this is all you need:

from config import CONFIG
from aita_core.ingest import run_ingestion

if __name__ == "__main__":
    run_ingestion(CONFIG)

The default pipeline collects handouts, homework (skipping solutions), slides, and syllabus, then builds the FAISS vector index.

If your directory layout differs, you can pass custom collectors:

from config import CONFIG
from aita_core.ingest import run_ingestion, load_pdf, collect_syllabus

def my_collect_handouts(config):
    # Custom logic to find and load handout PDFs
    docs = []
    # ... your code here ...
    return docs

if __name__ == "__main__":
    run_ingestion(CONFIG, collectors=[
        ("handouts", my_collect_handouts),
        ("syllabus", collect_syllabus),
    ])

Step 8: Build the Vector Store

python add_document.py

This reads your course materials, generates embeddings via OpenAI, and saves a FAISS index to faiss_db/.

Step 9: Test Locally

streamlit run main.py

Open http://localhost:8501 in your browser and test with sample questions.

Step 10: Deploy with Docker

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY config.py main.py ./
COPY client_secret*.json* ./
COPY faiss_db/ ./faiss_db/
COPY course_materials/ ./course_materials/

RUN mkdir -p /app/data
RUN mkdir -p /root/.streamlit
RUN echo '[server]\nheadless = true\nport = 8501\nenableCORS = false\nenableXsrfProtection = false\n\n[browser]\ngatherUsageStats = false' > /root/.streamlit/config.toml

EXPOSE 8501
ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]

Create a docker-compose.yml:

services:
  aita:
    build: .
    ports:
      - "8501:8501"
    env_file:
      - .env
    volumes:
      - /path/to/persistent/data:/app/data
    restart: unless-stopped

Build and run:

docker compose build
docker compose up -d

Your chatbot is now live at http://your-server:8501.

Step 11: Set Up .gitignore

.env
.venv/
__pycache__/
*.py[cod]
*.egg-info/
faiss_db/
backup/
docs/
course_materials/
client_secret*.json
.DS_Store
.vscode/
.idea/
.streamlit/secrets.toml

Step 12: (Optional) Google OAuth

To restrict login to @umn.edu accounts:

  1. Create a project in Google Cloud Console
  2. Enable the Google+ API (or People API)
  3. Create OAuth 2.0 credentials (Web application)
  4. Add your redirect URI (e.g., http://your-server:8501)
  5. Download the client secret JSON and place it in your project root as client_secret_*.json
  6. Set GOOGLE_COOKIE_KEY and GOOGLE_REDIRECT_URI in your .env

If any of these are missing, the app automatically falls back to student ID login.

Features

  • Pedagogical guardrails — Never gives direct answers; uses Socratic questioning
  • Week-aware responses — Won't discuss topics not yet covered in class
  • Source citations — References specific handouts, slides, and homework
  • PDF downloads — Students can download referenced course materials
  • Admin dashboard — View interaction logs, student feedback, and feature requests
  • Google OAuth — Restrict access to @umn.edu accounts (optional)
  • Mobile-friendly — Responsive UI works on phones and tablets

Cost

Using GPT-4o-mini (default), estimated cost is under $20/semester for a class of 80 students with heavy usage. See OpenAI pricing for current rates.

Course Repo Structure

AITA_XXXX/
├── config.py              # CourseConfig with all course-specific data
├── main.py                # 3 lines: import config, import aita_core, run
├── add_document.py        # 3 lines for standard layout, or custom collectors
├── course_materials/      # PDFs, LaTeX source (not committed)
├── faiss_db/              # Built vector index (not committed)
├── .env                   # API keys (not committed)
├── .gitignore
├── docker-compose.yml
├── Dockerfile
└── requirements.txt       # just: aita-core>=0.1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aita_core-0.2.0.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aita_core-0.2.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file aita_core-0.2.0.tar.gz.

File metadata

  • Download URL: aita_core-0.2.0.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for aita_core-0.2.0.tar.gz
Algorithm Hash digest
SHA256 96e064873260c50cd337a319ccb91f3160a6dfe51814a02ec6d20da92cecb297
MD5 ab69f559a82e8f431d26a20c083407e9
BLAKE2b-256 8ebe0e250629e6a000bccc76300b5fb31535a10f85f4c6544426190ebd2f66a6

See more details on using hashes here.

File details

Details for the file aita_core-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: aita_core-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for aita_core-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53be8677b3085a12832a8d788252fc91336d56b257b84164dc3ca3fce3ce6dcf
MD5 dcbe3117c576f605ccee330911070226
BLAKE2b-256 7f4ec6c90a249397f98f1f8cb24598ae29f55fd536e7409435157bd9196382b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page