Shared core for AI Teaching Assistant (AITA) chatbots
Project description
aita-core
Shared core package for AI Teaching Assistant (AITA) chatbots. Provides the Streamlit UI, RAG pipeline, database logging, admin panel, and document ingestion utilities — parameterized by a CourseConfig so the same codebase serves multiple courses.
How It Works
AITA is an AI chatbot that helps students learn course material. It uses Retrieval-Augmented Generation (RAG) — your course documents (slides, handouts, homework) are indexed into a vector database, and when a student asks a question, the system retrieves relevant content and generates a pedagogically appropriate response using an LLM.
Key design principle: the chatbot never gives direct answers. It guides students through Socratic questioning, hints, and conceptual explanations.
Step-by-Step Setup Guide
Prerequisites
- Python 3.11+
- Docker (for deployment)
- An OpenAI API key
- (Optional) Google Cloud OAuth credentials for UMN login
Step 1: Create Your Course Repository
Create a new directory for your course:
mkdir AITA_XXXX
cd AITA_XXXX
git init
Step 2: Install aita-core
pip install aita-core
Or add to requirements.txt:
aita-core>=0.1.0
Step 3: Set Up Environment Variables
Create a .env file (never commit this):
OPENAI_API_KEY=sk-your-openai-api-key
ADMIN_PASSWORD=your-admin-password
GOOGLE_COOKIE_KEY=your-random-secret-string
GOOGLE_REDIRECT_URI=http://your-server:8501
AITA_DATA_DIR=/app/data
Step 4: Add Your Course Materials
Create a course_materials/ directory and organize your files:
course_materials/
├── Handouts/
│ └── Handouts/
│ ├── 1 Topic Name.pdf
│ ├── 2 Topic Name.pdf
│ └── ...
├── Homework handouts/
│ └── Homework handouts/
│ ├── HW1.pdf
│ ├── HW2.pdf
│ └── ...
├── Slides/
│ └── Slides/
│ ├── 1 Topic Name/
│ │ ├── content.tex (or Notes.pdf)
│ │ └── Handout.pdf
│ └── ...
└── syllabus/
└── Syllabus.pdf (or Syllabus.tex)
Important: Do NOT include homework solutions — the chatbot could leak them to students.
Step 5: Create config.py
This is where you define everything specific to your course. Copy the template below and fill in your course details:
import os
import sys
import glob
from dotenv import load_dotenv
from aita_core import CourseConfig
load_dotenv()
BASE_DIR = os.path.dirname(__file__)
_client_secret_matches = glob.glob(os.path.join(BASE_DIR, "client_secret*.json"))
# Google Auth requires all three: client_secret file + GOOGLE_COOKIE_KEY + GOOGLE_REDIRECT_URI
_google_cookie_key = os.getenv("GOOGLE_COOKIE_KEY")
_google_redirect_uri = os.getenv("GOOGLE_REDIRECT_URI")
if _client_secret_matches and _google_cookie_key and _google_redirect_uri:
_google_client_secret = _client_secret_matches[0]
else:
_google_client_secret = ""
if _client_secret_matches:
print("[WARN] Google OAuth: client_secret found but GOOGLE_COOKIE_KEY or "
"GOOGLE_REDIRECT_URI not set. Falling back to student ID login.",
file=sys.stderr)
SYSTEM_PROMPT = """\
You are an AI Teaching Assistant for COURSE_NAME \
at the University of Minnesota, taught by Prof. YOUR NAME.
YOUR CORE PRINCIPLE: You must NEVER give direct answers to homework or exam problems. \
Instead, you should:
- Ask Socratic questions to guide students toward understanding
- Provide hints and point students to relevant concepts or course materials
- Explain underlying principles without solving the specific problem
- Encourage students to attempt the problem first and share their reasoning
- When students share their work, help them identify errors conceptually
When responding:
- Cite source material when referencing course content
- Be encouraging, patient, and supportive
- Keep responses focused and concise
- If the question is not related to the course, politely redirect
- Use LaTeX for math: $inline$ and $$display$$
- IMPORTANT: Never use \\[ \\] or \\( \\) for LaTeX. Always use $...$ and $$...$$
You will be provided with relevant context from course materials to ground your responses.\
"""
CONFIG = CourseConfig(
# --- Course identity ---
course_id="XXXX",
course_name="CEGE XXXX: AI Teaching Assistant",
course_short_name="CEGE XXXX AITA",
course_description=(
"Welcome! This AI assistant helps you learn concepts for "
"**CEGE XXXX: Your Course Title**."
),
system_prompt=SYSTEM_PROMPT,
# --- Week-to-topic mapping ---
# What topics are covered each week? Used to prevent the chatbot
# from discussing future topics before they're taught.
week_topics={
1: ["Topic A"],
2: ["Topic B", "Topic C"],
3: ["Topic D"],
# ... add all 15 weeks
15: ["Final exam review"],
},
# --- Document-to-week mapping ---
# Maps the number prefix in filenames (e.g., "3 Topic Name.pdf" -> topic 3)
# to the week that topic is first covered.
topic_num_to_week={
1: 1, 2: 2, 3: 3,
# ... one entry per handout/slide topic folder
},
# Maps homework number to the week it's assigned.
hw_num_to_week={
1: 2, 2: 3, 3: 4,
# ... one entry per homework
},
# Maps lab number to week.
lab_num_to_week={
1: 1, 2: 2, 3: 3,
# ... one entry per lab
},
# Maps study guide / quiz names to week. Leave empty if not applicable.
study_guide_to_week={},
# --- Example prompts ---
# Shown as clickable buttons when chat is empty. 4 per week works well.
example_prompts={
1: [
"What topics does this course cover?",
"What are the prerequisites?",
"How is grading structured?",
"Help me with this week's homework",
],
# ... add for each week
},
# --- Paths ---
base_dir=BASE_DIR,
course_materials_dir=os.path.join(BASE_DIR, "course_materials"),
faiss_db_dir=os.path.join(BASE_DIR, "faiss_db"),
docs_dir=os.path.join(BASE_DIR, "docs"),
backup_dir=os.path.join(BASE_DIR, "backup"),
data_dir=os.getenv("AITA_DATA_DIR", os.path.join(BASE_DIR, "data")),
# --- Auth ---
admin_password=os.getenv("ADMIN_PASSWORD", ""),
cookie_name="aita_XXXX_auth",
cookie_key=_google_cookie_key or "",
redirect_uri=_google_redirect_uri or "http://localhost:8501",
google_client_secret_file=_google_client_secret,
)
Step 6: Create main.py
from config import CONFIG
from aita_core import run
run(CONFIG)
Step 7: Create add_document.py
If your course materials follow the standard directory layout (see Step 4), this is all you need:
from config import CONFIG
from aita_core.ingest import run_ingestion
if __name__ == "__main__":
run_ingestion(CONFIG)
The default pipeline collects handouts, homework (skipping solutions), slides, and syllabus, then builds the FAISS vector index.
If your directory layout differs, you can pass custom collectors:
from config import CONFIG
from aita_core.ingest import run_ingestion, load_pdf, collect_syllabus
def my_collect_handouts(config):
# Custom logic to find and load handout PDFs
docs = []
# ... your code here ...
return docs
if __name__ == "__main__":
run_ingestion(CONFIG, collectors=[
("handouts", my_collect_handouts),
("syllabus", collect_syllabus),
])
Step 8: Build the Vector Store
python add_document.py
This reads your course materials, generates embeddings via OpenAI, and saves a FAISS index to faiss_db/.
Step 9: Test Locally
streamlit run main.py
Open http://localhost:8501 in your browser and test with sample questions.
Step 10: Deploy with Docker
Create a Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY config.py main.py ./
COPY client_secret*.json* ./
COPY faiss_db/ ./faiss_db/
COPY course_materials/ ./course_materials/
RUN mkdir -p /app/data
RUN mkdir -p /root/.streamlit
RUN echo '[server]\nheadless = true\nport = 8501\nenableCORS = false\nenableXsrfProtection = false\n\n[browser]\ngatherUsageStats = false' > /root/.streamlit/config.toml
EXPOSE 8501
ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]
Create a docker-compose.yml:
services:
aita:
build: .
ports:
- "8501:8501"
env_file:
- .env
volumes:
- /path/to/persistent/data:/app/data
restart: unless-stopped
Build and run:
docker compose build
docker compose up -d
Your chatbot is now live at http://your-server:8501.
Step 11: Set Up .gitignore
.env
.venv/
__pycache__/
*.py[cod]
*.egg-info/
faiss_db/
backup/
docs/
course_materials/
client_secret*.json
.DS_Store
.vscode/
.idea/
.streamlit/secrets.toml
Step 12: (Optional) Google OAuth
To restrict login to @umn.edu accounts:
- Create a project in Google Cloud Console
- Enable the Google+ API (or People API)
- Create OAuth 2.0 credentials (Web application)
- Add your redirect URI (e.g.,
http://your-server:8501) - Download the client secret JSON and place it in your project root as
client_secret_*.json - Set
GOOGLE_COOKIE_KEYandGOOGLE_REDIRECT_URIin your.env
If any of these are missing, the app automatically falls back to student ID login.
Features
- Pedagogical guardrails — Never gives direct answers; uses Socratic questioning
- Week-aware responses — Won't discuss topics not yet covered in class
- Source citations — References specific handouts, slides, and homework
- PDF downloads — Students can download referenced course materials
- Admin dashboard — View interaction logs, student feedback, and feature requests
- Google OAuth — Restrict access to
@umn.eduaccounts (optional) - Mobile-friendly — Responsive UI works on phones and tablets
Cost
Using GPT-4o-mini (default), estimated cost is under $20/semester for a class of 80 students with heavy usage. See OpenAI pricing for current rates.
Course Repo Structure
AITA_XXXX/
├── config.py # CourseConfig with all course-specific data
├── main.py # 3 lines: import config, import aita_core, run
├── add_document.py # 3 lines for standard layout, or custom collectors
├── course_materials/ # PDFs, LaTeX source (not committed)
├── faiss_db/ # Built vector index (not committed)
├── .env # API keys (not committed)
├── .gitignore
├── docker-compose.yml
├── Dockerfile
└── requirements.txt # just: aita-core>=0.1.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aita_core-0.2.0.tar.gz.
File metadata
- Download URL: aita_core-0.2.0.tar.gz
- Upload date:
- Size: 25.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96e064873260c50cd337a319ccb91f3160a6dfe51814a02ec6d20da92cecb297
|
|
| MD5 |
ab69f559a82e8f431d26a20c083407e9
|
|
| BLAKE2b-256 |
8ebe0e250629e6a000bccc76300b5fb31535a10f85f4c6544426190ebd2f66a6
|
File details
Details for the file aita_core-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aita_core-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53be8677b3085a12832a8d788252fc91336d56b257b84164dc3ca3fce3ce6dcf
|
|
| MD5 |
dcbe3117c576f605ccee330911070226
|
|
| BLAKE2b-256 |
7f4ec6c90a249397f98f1f8cb24598ae29f55fd536e7409435157bd9196382b3
|