An integration package connecting Google Classroom and LangChain
Project description
๐ langchain-google-classroom
A LangChain integration package that loads Google Classroom content โ assignments, announcements, course materials, and Drive attachments โ as Document objects for RAG pipelines, semantic search, AI teaching assistants, and course chatbots.
โจ Features
- Full Classroom coverage โ assignments, announcements, and course materials
- Drive attachments โ auto-download and parse PDF, DOCX, text, CSV, HTML files
- Vision LLM image description โ embedded PDF images described by Gemini/GPT-4V
- Pluggable parsers โ bring your own
BaseBlobParser(PyMuPDF, Unstructured, etc.) - Retry/backoff โ exponential backoff with jitter on rate-limited API calls
- Flexible auth โ service accounts, OAuth, cached tokens, or pre-built credentials
- Rich metadata โ course info, timestamps, due dates, links on every Document
- Lazy loading โ memory-efficient streaming via
lazy_load()
๐ฆ Installation
pip install langchain-google-classroom
With file attachment parsing (PDF, DOCX):
pip install langchain-google-classroom[parsers]
๐ Quickstart
from langchain_google_classroom import GoogleClassroomLoader
# Load all accessible courses
loader = GoogleClassroomLoader()
docs = loader.load()
for doc in docs:
print(doc.metadata["content_type"], "โ", doc.metadata["title"])
print(doc.page_content[:200])
print()
๐ Authentication
Service Account (recommended for production)
loader = GoogleClassroomLoader(
service_account_file="service_account.json",
)
OAuth User Credentials
loader = GoogleClassroomLoader(
client_secrets_file="credentials.json",
token_file="token.json",
)
Pre-built Credentials
from google.oauth2 import service_account
creds = service_account.Credentials.from_service_account_file(
"service_account.json",
scopes=["https://www.googleapis.com/auth/classroom.courses.readonly"],
)
loader = GoogleClassroomLoader(credentials=creds)
๐ Attachments & File Parsing
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_attachments=True, # Download Drive files
parse_attachments=True, # Parse with BaseBlobParser
)
docs = loader.load()
# Yields: assignment docs + parsed PDF/DOCX/text attachment docs
Custom Parser
from langchain_community.document_loaders.parsers.pdf import PyMuPDFParser
loader = GoogleClassroomLoader(
course_ids=["123456789"],
file_parser_cls=PyMuPDFParser,
)
๐ผ๏ธ Vision LLM โ Image Description
Extract and describe images embedded in PDFs using any vision-capable LLM:
from langchain_google_genai import ChatGoogleGenerativeAI
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_attachments=True,
vision_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
)
docs = loader.load()
# PDF pages now include: "[Image: chart.png]\nA bar chart showing student grades..."
๐ฏ Selective Loading
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_assignments=True,
load_announcements=False,
load_materials=False,
load_attachments=False,
)
๐ Document Structure
Each document includes rich metadata:
Document(
page_content="Assignment: Homework 3\n\nComplete exercises 1-5...",
metadata={
"source": "google_classroom",
"course_id": "12345",
"course_name": "Machine Learning",
"content_type": "assignment", # or "announcement", "material", "assignment_attachment"
"title": "Homework 3",
"item_id": "67890",
"created_time": "2024-01-15T10:00:00Z",
"updated_time": "2024-01-15T10:00:00Z",
"due_date": "2024-01-22T23:59:00", # assignments only
"max_points": 100.0, # assignments only
"alternate_link": "https://classroom.google.com/...",
}
)
โ๏ธ Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
course_ids |
list[str] |
None |
Specific course IDs (None = all accessible) |
load_assignments |
bool |
True |
Load courseWork items |
load_announcements |
bool |
True |
Load announcements |
load_materials |
bool |
True |
Load courseWorkMaterials |
load_attachments |
bool |
True |
Download and process Drive attachments |
parse_attachments |
bool |
True |
Parse files with BaseBlobParser |
load_images |
bool |
False |
Process image MIME types |
vision_model |
BaseChatModel |
None |
Vision LLM for image description |
image_prompt |
str |
None |
Custom prompt for vision model |
file_parser_cls |
type[BaseBlobParser] |
None |
Custom parser for all attachments |
file_parser_kwargs |
dict |
None |
kwargs for custom parser |
credentials |
Credentials |
None |
Pre-built Google credentials |
service_account_file |
str |
None |
Service account key JSON path |
token_file |
str |
None |
Cached OAuth token path |
client_secrets_file |
str |
None |
OAuth client secrets path |
scopes |
list[str] |
Read-only | API scopes to request |
๐๏ธ Architecture
GoogleClassroomLoader (BaseLoader)
โโโ _utilities.py โ auth, retry/backoff, guard_import
โโโ classroom_api.py โ paginated Classroom API fetcher
โโโ document_builder.py โ raw API โ LangChain Document
โโโ drive_resolver.py โ Drive download/export
โโโ normalizer.py โ text cleanup (Unicode NFC, whitespace)
โโโ parsers/
โโโ __init__.py โ MIME registry + get_parser()
โโโ pdf_parser.py โ pypdf + vision LLM
โโโ docx_parser.py โ python-docx
โโโ text_parser.py โ built-in UTF-8
โโโ image_parser.py โ vision LLM + base64 fallback
๐งช Development
# Clone and install
git clone https://github.com/ayanokojix21/langchain-google-classroom.git
cd langchain-google-classroom
pip install -e ".[dev]"
# Run tests
pytest tests/unit/ -v
# Lint
ruff check langchain_google_classroom/ tests/
๐ License
MIT โ see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_google_classroom-0.1.0.tar.gz.
File metadata
- Download URL: langchain_google_classroom-0.1.0.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02f19540637e7244811d520f69114324ac5c67195dbe012ba48dd87fce5dd6ba
|
|
| MD5 |
2e97a3af3fbf286e4d2b7ed1cd50c9b4
|
|
| BLAKE2b-256 |
38254333337cf4f462f285ea4c4fa85c4be17e9fec7aa105ce039e17b0dc9448
|
File details
Details for the file langchain_google_classroom-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_google_classroom-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5124e8b56f16d417791d5168dc6cf54eb4752d9c053d27b600e02967f95101b6
|
|
| MD5 |
6f5f3cc0bdf0943aceb3b477b965ca0c
|
|
| BLAKE2b-256 |
c840af700b1591d6c223d082bb3ff0c6516b9b18504bb9da76f98d96b9d0b6f9
|