Skip to main content

An integration package connecting Google Classroom and LangChain

Project description

๐ŸŽ“ langchain-google-classroom

CI PyPI version Python License: MIT

A LangChain integration package that loads Google Classroom content โ€” assignments, announcements, course materials, and Drive attachments โ€” as Document objects for RAG pipelines, semantic search, AI teaching assistants, and course chatbots.

โœจ Features

  • Full Classroom coverage โ€” assignments, announcements, and course materials
  • Drive attachments โ€” auto-download and parse PDF, DOCX, text, CSV, HTML files
  • Vision LLM image description โ€” embedded PDF images described by Gemini/GPT-4V
  • Pluggable parsers โ€” bring your own BaseBlobParser (PyMuPDF, Unstructured, etc.)
  • Retry/backoff โ€” exponential backoff with jitter on rate-limited API calls
  • Flexible auth โ€” service accounts, OAuth, cached tokens, or pre-built credentials
  • Rich metadata โ€” course info, timestamps, due dates, links on every Document
  • Lazy loading โ€” memory-efficient streaming via lazy_load()

๐Ÿ“ฆ Installation

pip install langchain-google-classroom

With file attachment parsing (PDF, DOCX):

pip install langchain-google-classroom[parsers]

๐Ÿš€ Quickstart

from langchain_google_classroom import GoogleClassroomLoader

# Load all accessible courses
loader = GoogleClassroomLoader()
docs = loader.load()

for doc in docs:
    print(doc.metadata["content_type"], "โ€”", doc.metadata["title"])
    print(doc.page_content[:200])
    print()

๐Ÿ” Authentication

Service Account (recommended for production)

loader = GoogleClassroomLoader(
    service_account_file="service_account.json",
)

OAuth User Credentials

loader = GoogleClassroomLoader(
    client_secrets_file="credentials.json",
    token_file="token.json",
)

Pre-built Credentials

from google.oauth2 import service_account

creds = service_account.Credentials.from_service_account_file(
    "service_account.json",
    scopes=["https://www.googleapis.com/auth/classroom.courses.readonly"],
)
loader = GoogleClassroomLoader(credentials=creds)

๐Ÿ“Ž Attachments & File Parsing

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_attachments=True,      # Download Drive files
    parse_attachments=True,     # Parse with BaseBlobParser
)
docs = loader.load()
# Yields: assignment docs + parsed PDF/DOCX/text attachment docs

Custom Parser

from langchain_community.document_loaders.parsers.pdf import PyMuPDFParser

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    file_parser_cls=PyMuPDFParser,
)

๐Ÿ–ผ๏ธ Vision LLM โ€” Image Description

Extract and describe images embedded in PDFs using any vision-capable LLM:

from langchain_google_genai import ChatGoogleGenerativeAI

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_attachments=True,
    vision_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
)
docs = loader.load()
# PDF pages now include: "[Image: chart.png]\nA bar chart showing student grades..."

๐ŸŽฏ Selective Loading

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_assignments=True,
    load_announcements=False,
    load_materials=False,
    load_attachments=False,
)

๐Ÿ“„ Document Structure

Each document includes rich metadata:

Document(
    page_content="Assignment: Homework 3\n\nComplete exercises 1-5...",
    metadata={
        "source": "google_classroom",
        "course_id": "12345",
        "course_name": "Machine Learning",
        "content_type": "assignment",        # or "announcement", "material", "assignment_attachment"
        "title": "Homework 3",
        "item_id": "67890",
        "created_time": "2024-01-15T10:00:00Z",
        "updated_time": "2024-01-15T10:00:00Z",
        "due_date": "2024-01-22T23:59:00",   # assignments only
        "max_points": 100.0,                  # assignments only
        "alternate_link": "https://classroom.google.com/...",
    }
)

โš™๏ธ Configuration Reference

Parameter Type Default Description
course_ids list[str] None Specific course IDs (None = all accessible)
load_assignments bool True Load courseWork items
load_announcements bool True Load announcements
load_materials bool True Load courseWorkMaterials
load_attachments bool True Download and process Drive attachments
parse_attachments bool True Parse files with BaseBlobParser
load_images bool False Process image MIME types
vision_model BaseChatModel None Vision LLM for image description
image_prompt str None Custom prompt for vision model
file_parser_cls type[BaseBlobParser] None Custom parser for all attachments
file_parser_kwargs dict None kwargs for custom parser
credentials Credentials None Pre-built Google credentials
service_account_file str None Service account key JSON path
token_file str None Cached OAuth token path
client_secrets_file str None OAuth client secrets path
scopes list[str] Read-only API scopes to request

๐Ÿ—๏ธ Architecture

GoogleClassroomLoader (BaseLoader)
โ”œโ”€โ”€ _utilities.py         โ€” auth, retry/backoff, guard_import
โ”œโ”€โ”€ classroom_api.py      โ€” paginated Classroom API fetcher
โ”œโ”€โ”€ document_builder.py   โ€” raw API โ†’ LangChain Document
โ”œโ”€โ”€ drive_resolver.py     โ€” Drive download/export
โ”œโ”€โ”€ normalizer.py         โ€” text cleanup (Unicode NFC, whitespace)
โ””โ”€โ”€ parsers/
    โ”œโ”€โ”€ __init__.py       โ€” MIME registry + get_parser()
    โ”œโ”€โ”€ pdf_parser.py     โ€” pypdf + vision LLM
    โ”œโ”€โ”€ docx_parser.py    โ€” python-docx
    โ”œโ”€โ”€ text_parser.py    โ€” built-in UTF-8
    โ””โ”€โ”€ image_parser.py   โ€” vision LLM + base64 fallback

๐Ÿงช Development

# Clone and install
git clone https://github.com/ayanokojix21/langchain-google-classroom.git
cd langchain-google-classroom
pip install -e ".[dev]"

# Run tests
pytest tests/unit/ -v

# Lint
ruff check langchain_google_classroom/ tests/

๐Ÿ“ License

MIT โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_google_classroom-0.1.0.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_google_classroom-0.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_google_classroom-0.1.0.tar.gz.

File metadata

File hashes

Hashes for langchain_google_classroom-0.1.0.tar.gz
Algorithm Hash digest
SHA256 02f19540637e7244811d520f69114324ac5c67195dbe012ba48dd87fce5dd6ba
MD5 2e97a3af3fbf286e4d2b7ed1cd50c9b4
BLAKE2b-256 38254333337cf4f462f285ea4c4fa85c4be17e9fec7aa105ce039e17b0dc9448

See more details on using hashes here.

File details

Details for the file langchain_google_classroom-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_google_classroom-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5124e8b56f16d417791d5168dc6cf54eb4752d9c053d27b600e02967f95101b6
MD5 6f5f3cc0bdf0943aceb3b477b965ca0c
BLAKE2b-256 c840af700b1591d6c223d082bb3ff0c6516b9b18504bb9da76f98d96b9d0b6f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page