Shared core for AI Teaching Assistant (AITA) chatbots

Project description

aita-core

Shared core package for AI Teaching Assistant (AITA) chatbots. Provides the Streamlit UI, RAG pipeline, database logging, admin panel, and document ingestion utilities — parameterized by a CourseConfig so the same codebase serves multiple courses.

How It Works

AITA is an AI chatbot that helps students learn course material. It uses Retrieval-Augmented Generation (RAG) — your course documents (slides, handouts, homework) are indexed into a vector database, and when a student asks a question, the system retrieves relevant content and generates a pedagogically appropriate response using an LLM.

Key design principle: the chatbot never gives direct answers. It guides students through Socratic questioning, hints, and conceptual explanations.

Step-by-Step Setup Guide

Prerequisites

Python 3.11+
Docker (for deployment)
An OpenAI API key
(Optional) Google Cloud OAuth credentials for UMN login

Step 1: Create Your Course Repository

Create a new directory for your course:

mkdir AITA_XXXX
cd AITA_XXXX
git init

Step 2: Install aita-core

pip install aita-core

Or add to requirements.txt:

aita-core>=0.1.0

Step 3: Set Up Environment Variables

Create a .env file (never commit this):

OPENAI_API_KEY=sk-your-openai-api-key
ADMIN_PASSWORD=your-admin-password
GOOGLE_COOKIE_KEY=your-random-secret-string
GOOGLE_REDIRECT_URI=http://your-server:8501
AITA_DATA_DIR=/app/data

Step 4: Add Your Course Materials

Create a course_materials/ directory and organize your files:

course_materials/
├── Handouts/
│   └── Handouts/
│       ├── 1 Topic Name.pdf
│       ├── 2 Topic Name.pdf
│       └── ...
├── Homework handouts/
│   └── Homework handouts/
│       ├── HW1.pdf
│       ├── HW2.pdf
│       └── ...
├── Slides/
│   └── Slides/
│       ├── 1 Topic Name/
│       │   ├── content.tex    (or Notes.pdf)
│       │   └── Handout.pdf
│       └── ...
└── syllabus/
    └── Syllabus.pdf (or Syllabus.tex)

Important: Do NOT include homework solutions — the chatbot could leak them to students.

Step 5: Create `config.py`

This is where you define everything specific to your course. Copy the template below and fill in your course details:

import os
import sys
import glob
from dotenv import load_dotenv
from aita_core import CourseConfig

load_dotenv()

BASE_DIR = os.path.dirname(__file__)
_client_secret_matches = glob.glob(os.path.join(BASE_DIR, "client_secret*.json"))

# Google Auth requires all three: client_secret file + GOOGLE_COOKIE_KEY + GOOGLE_REDIRECT_URI
_google_cookie_key = os.getenv("GOOGLE_COOKIE_KEY")
_google_redirect_uri = os.getenv("GOOGLE_REDIRECT_URI")
if _client_secret_matches and _google_cookie_key and _google_redirect_uri:
    _google_client_secret = _client_secret_matches[0]
else:
    _google_client_secret = ""
    if _client_secret_matches:
        print("[WARN] Google OAuth: client_secret found but GOOGLE_COOKIE_KEY or "
              "GOOGLE_REDIRECT_URI not set. Falling back to student ID login.",
              file=sys.stderr)

SYSTEM_PROMPT = """\
You are an AI Teaching Assistant for COURSE_NAME \
at the University of Minnesota, taught by Prof. YOUR NAME.

YOUR CORE PRINCIPLE: You must NEVER give direct answers to homework or exam problems. \
Instead, you should:
- Ask Socratic questions to guide students toward understanding
- Provide hints and point students to relevant concepts or course materials
- Explain underlying principles without solving the specific problem
- Encourage students to attempt the problem first and share their reasoning
- When students share their work, help them identify errors conceptually

When responding:
- Cite source material when referencing course content
- Be encouraging, patient, and supportive
- Keep responses focused and concise
- If the question is not related to the course, politely redirect
- Use LaTeX for math: $inline$ and $$display$$
- IMPORTANT: Never use \\[ \\] or \\( \\) for LaTeX. Always use $...$ and $$...$$

You will be provided with relevant context from course materials to ground your responses.\
"""

CONFIG = CourseConfig(
    # --- Course identity ---
    course_id="XXXX",
    course_name="CEGE XXXX: AI Teaching Assistant",
    course_short_name="CEGE XXXX AITA",
    course_description=(
        "Welcome! This AI assistant helps you learn concepts for "
        "**CEGE XXXX: Your Course Title**."
    ),
    system_prompt=SYSTEM_PROMPT,

    # --- Week-to-topic mapping ---
    # What topics are covered each week? Used to prevent the chatbot
    # from discussing future topics before they're taught.
    week_topics={
        1:  ["Topic A"],
        2:  ["Topic B", "Topic C"],
        3:  ["Topic D"],
        # ... add all 15 weeks
        15: ["Final exam review"],
    },

    # --- Document-to-week mapping ---
    # Maps the number prefix in filenames (e.g., "3 Topic Name.pdf" -> topic 3)
    # to the week that topic is first covered.
    topic_num_to_week={
        1: 1, 2: 2, 3: 3,
        # ... one entry per handout/slide topic folder
    },

    # Maps homework number to the week it's assigned.
    hw_num_to_week={
        1: 2, 2: 3, 3: 4,
        # ... one entry per homework
    },

    # Maps lab number to week.
    lab_num_to_week={
        1: 1, 2: 2, 3: 3,
        # ... one entry per lab
    },

    # Maps study guide / quiz names to week. Leave empty if not applicable.
    study_guide_to_week={},

    # --- Example prompts ---
    # Shown as clickable buttons when chat is empty. 4 per week works well.
    example_prompts={
        1: [
            "What topics does this course cover?",
            "What are the prerequisites?",
            "How is grading structured?",
            "Help me with this week's homework",
        ],
        # ... add for each week
    },

    # --- Paths ---
    base_dir=BASE_DIR,
    course_materials_dir=os.path.join(BASE_DIR, "course_materials"),
    faiss_db_dir=os.path.join(BASE_DIR, "faiss_db"),
    docs_dir=os.path.join(BASE_DIR, "docs"),
    backup_dir=os.path.join(BASE_DIR, "backup"),
    data_dir=os.getenv("AITA_DATA_DIR", os.path.join(BASE_DIR, "data")),

    # --- Auth ---
    admin_password=os.getenv("ADMIN_PASSWORD", ""),
    cookie_name="aita_XXXX_auth",
    cookie_key=_google_cookie_key or "",
    redirect_uri=_google_redirect_uri or "http://localhost:8501",
    google_client_secret_file=_google_client_secret,
)

Step 6: Create `main.py`

from config import CONFIG
from aita_core import run

run(CONFIG)

Step 7: Create `add_document.py`

If your course materials follow the standard directory layout (see Step 4), this is all you need:

from config import CONFIG
from aita_core.ingest import run_ingestion

if __name__ == "__main__":
    run_ingestion(CONFIG)

The default pipeline collects handouts, homework (skipping solutions), slides, and syllabus, then builds the FAISS vector index.

If your directory layout differs, you can pass custom collectors:

from config import CONFIG
from aita_core.ingest import run_ingestion, load_pdf, collect_syllabus

def my_collect_handouts(config):
    # Custom logic to find and load handout PDFs
    docs = []
    # ... your code here ...
    return docs

if __name__ == "__main__":
    run_ingestion(CONFIG, collectors=[
        ("handouts", my_collect_handouts),
        ("syllabus", collect_syllabus),
    ])

Step 8: Build the Vector Store

python add_document.py

This reads your course materials, generates embeddings via OpenAI, and saves a FAISS index to faiss_db/.

Step 9: Test Locally

streamlit run main.py

Open http://localhost:8501 in your browser and test with sample questions.

Step 10: Deploy with Docker

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY config.py main.py ./
COPY client_secret*.json* ./
COPY faiss_db/ ./faiss_db/
COPY course_materials/ ./course_materials/

RUN mkdir -p /app/data
RUN mkdir -p /root/.streamlit
RUN echo '[server]\nheadless = true\nport = 8501\nenableCORS = false\nenableXsrfProtection = false\n\n[browser]\ngatherUsageStats = false' > /root/.streamlit/config.toml

EXPOSE 8501
ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]

Create a docker-compose.yml:

services:
  aita:
    build: .
    ports:
      - "8501:8501"
    env_file:
      - .env
    volumes:
      - /path/to/persistent/data:/app/data
    restart: unless-stopped

Build and run:

docker compose build
docker compose up -d

Your chatbot is now live at http://your-server:8501.

Step 11: Set Up `.gitignore`

.env
.venv/
__pycache__/
*.py[cod]
*.egg-info/
faiss_db/
backup/
docs/
course_materials/
client_secret*.json
.DS_Store
.vscode/
.idea/
.streamlit/secrets.toml

Step 12: (Optional) Google OAuth

To restrict login to @umn.edu accounts:

Create a project in Google Cloud Console
Enable the Google+ API (or People API)
Create OAuth 2.0 credentials (Web application)
Add your redirect URI (e.g., http://your-server:8501)
Download the client secret JSON and place it in your project root as client_secret_*.json
Set GOOGLE_COOKIE_KEY and GOOGLE_REDIRECT_URI in your .env

If any of these are missing, the app automatically falls back to student ID login.

Features

Pedagogical guardrails — Never gives direct answers; uses Socratic questioning
Week-aware responses — Won't discuss topics not yet covered in class
Source citations — References specific handouts, slides, and homework
PDF downloads — Students can download referenced course materials
Admin dashboard — View interaction logs, student feedback, and feature requests
Google OAuth — Restrict access to @umn.edu accounts (optional)
Mobile-friendly — Responsive UI works on phones and tablets

Cost

Using GPT-4o-mini (default), estimated cost is under $20/semester for a class of 80 students with heavy usage. See OpenAI pricing for current rates.

Course Repo Structure

AITA_XXXX/
├── config.py              # CourseConfig with all course-specific data
├── main.py                # 3 lines: import config, import aita_core, run
├── add_document.py        # 3 lines for standard layout, or custom collectors
├── course_materials/      # PDFs, LaTeX source (not committed)
├── faiss_db/              # Built vector index (not committed)
├── .env                   # API keys (not committed)
├── .gitignore
├── docker-compose.yml
├── Dockerfile
└── requirements.txt       # just: aita-core>=0.1.0

Project details

Release history Release notifications | RSS feed

0.5.0

Mar 10, 2026

This version

0.2.0

Mar 7, 2026

0.1.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aita_core-0.2.0.tar.gz (25.3 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aita_core-0.2.0-py3-none-any.whl (22.3 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file aita_core-0.2.0.tar.gz.

File metadata

Download URL: aita_core-0.2.0.tar.gz
Upload date: Mar 7, 2026
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for aita_core-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`96e064873260c50cd337a319ccb91f3160a6dfe51814a02ec6d20da92cecb297`
MD5	`ab69f559a82e8f431d26a20c083407e9`
BLAKE2b-256	`8ebe0e250629e6a000bccc76300b5fb31535a10f85f4c6544426190ebd2f66a6`

See more details on using hashes here.

File details

Details for the file aita_core-0.2.0-py3-none-any.whl.

File metadata

Download URL: aita_core-0.2.0-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 22.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for aita_core-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53be8677b3085a12832a8d788252fc91336d56b257b84164dc3ca3fce3ce6dcf`
MD5	`dcbe3117c576f605ccee330911070226`
BLAKE2b-256	`7f4ec6c90a249397f98f1f8cb24598ae29f55fd536e7409435157bd9196382b3`

See more details on using hashes here.

aita-core 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

aita-core

How It Works

Step-by-Step Setup Guide

Prerequisites

Step 1: Create Your Course Repository

Step 2: Install aita-core

Step 3: Set Up Environment Variables

Step 4: Add Your Course Materials

Step 5: Create config.py

Step 6: Create main.py

Step 7: Create add_document.py

Step 8: Build the Vector Store

Step 9: Test Locally

Step 10: Deploy with Docker

Step 11: Set Up .gitignore

Step 12: (Optional) Google OAuth

Features

Cost

Course Repo Structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Step 5: Create `config.py`

Step 6: Create `main.py`

Step 7: Create `add_document.py`

Step 11: Set Up `.gitignore`