Skip to main content

An integration package connecting Google Classroom and LangChain

Project description

๐ŸŽ“ langchain-google-classroom

CI PyPI version Python License: MIT

Status: Community integration package (not officially listed by LangChain yet).

A community package for loading Google Classroom content โ€” assignments, announcements, course materials, and Drive attachments โ€” as Document objects for RAG pipelines, semantic search, AI teaching assistants, and course chatbots.

Compatible with LangChain Document Loaders.

๐Ÿค” What is this?

Google Classroom data is difficult to integrate directly into LLM workflows. This package converts Classroom content into LangChain Document objects, making it easier to build:

  • AI teaching assistants
  • Course chatbots
  • Semantic search over coursework
  • Automated grading helpers

๐Ÿ“– Documentation

โœจ Features

  • Full Classroom coverage โ€” assignments, announcements, and course materials
  • Drive attachments โ€” auto-download and parse PDF, DOCX, text, CSV, HTML files
  • Vision LLM image understanding โ€” embedded PDF images described by Gemini/GPT-4V
  • Pluggable parsers โ€” bring your own BaseBlobParser (PyMuPDF, Unstructured, etc.)
  • Retry/backoff โ€” exponential backoff with jitter on rate-limited API calls
  • Flexible auth โ€” service accounts, OAuth, cached tokens, or pre-built credentials
  • Rich metadata โ€” course info, timestamps, due dates, links on every Document
  • Lazy loading โ€” memory-efficient streaming via lazy_load()

๐Ÿ“ฆ Installation

Requires Python >=3.10.

pip install langchain-google-classroom

With file attachment parsing (PDF, DOCX):

pip install "langchain-google-classroom[parsers]"

๐Ÿš€ Quickstart

from langchain_google_classroom import GoogleClassroomLoader

# Load all accessible courses
loader = GoogleClassroomLoader()
docs = loader.load()  # eager loading

for doc in docs:
    print(doc.metadata["content_type"], "โ€”", doc.metadata["title"])
    print(doc.page_content[:200])
    print()

# Lazy loading (stream documents one by one)
for doc in loader.lazy_load():
    print(doc.metadata["content_type"], "โ€”", doc.metadata["title"])

See examples/ for more usage examples.

Sample output:

assignment โ€” Homework 3
announcement โ€” Exam postponed
material โ€” Lecture 4 Slides

๐Ÿง  RAG Example (Optional)

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_google_classroom import GoogleClassroomLoader

loader = GoogleClassroomLoader()
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = splitter.split_documents(docs)

vectorstore = FAISS.from_documents(chunks, OpenAIEmbeddings())

Install optional dependencies for this example as needed (langchain-text-splitters, langchain-openai, faiss-cpu).

๐Ÿ” Setup Credentials

Google Classroom APIs require authentication. Use one of the following methods:

1. OAuth User Credentials (Recommended)

This is the easiest way to start. When you run this the first time, your browser will open asking you to log in with your Google Account, and it will generate a token.json for all future requests.

loader = GoogleClassroomLoader(
    client_secrets_file="credentials.json",
    token_file="token.json",
)

2. Service Account

Service accounts do not require human interaction. However, please note that Service Accounts act as "bot users" and cannot see your personal Google Classroom courses unless your Google Workspace Administrator explicitly grants them "Domain-Wide Delegation" for classroom scopes.

loader = GoogleClassroomLoader(
    service_account_file="service_account.json",
)

3. Pre-built Credentials Object

from google.oauth2 import service_account

creds = service_account.Credentials.from_service_account_file(
    "service_account.json",
    scopes=["https://www.googleapis.com/auth/classroom.courses.readonly"],
)
loader = GoogleClassroomLoader(credentials=creds)

Credential safety:

  • Never commit credentials.json, token.json, or service_account.json.
  • Use GitHub Actions Secrets for CI integration tests.

๐Ÿ“Ž Attachments & File Parsing

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_attachments=True,      # Download Drive files
    parse_attachments=True,     # Parse with BaseBlobParser
)
docs = loader.load()
# Yields: assignment docs + parsed PDF/DOCX/text attachment docs

Custom Parser

from langchain_community.document_loaders.parsers.pdf import PyMuPDFParser

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    file_parser_cls=PyMuPDFParser,
)

๐Ÿ–ผ๏ธ Vision LLM โ€” Image Understanding

Extract and describe images embedded in PDFs using any vision-capable LLM:

from langchain_google_genai import ChatGoogleGenerativeAI

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_attachments=True,
    vision_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
)
docs = loader.load()
# PDF pages now include: "[Image: chart.png]\nA bar chart showing student grades..."

๐ŸŽฏ Selective Loading

loader = GoogleClassroomLoader(
    course_ids=["123456789"],
    load_assignments=True,
    load_announcements=False,
    load_materials=False,
    load_attachments=False,
)

๐Ÿ“„ Document Structure

Each document includes rich metadata:

Document(
    page_content="Assignment: Homework 3\n\nComplete exercises 1-5...",
    metadata={
        "source": "google_classroom",
        "course_id": "12345",
        "course_name": "Machine Learning",
        "content_type": "assignment",        # or "announcement", "material", "assignment_attachment"
        "title": "Homework 3",
        "item_id": "67890",
        "created_time": "2024-01-15T10:00:00Z",
        "updated_time": "2024-01-15T10:00:00Z",
        "due_date": "2024-01-22T23:59:00",   # assignments only
        "max_points": 100,                    # assignments only
        "alternate_link": "https://classroom.google.com/...",
    }
)

โš™๏ธ Configuration Reference

Parameter Type Default Description
course_ids list[str] None Specific course IDs (None = all accessible)
load_assignments bool True Load courseWork items
load_announcements bool True Load announcements
load_materials bool True Load courseWorkMaterials
load_attachments bool True Download and process Drive attachments
parse_attachments bool True Parse files with BaseBlobParser
load_images bool False Process image MIME types
vision_model BaseChatModel None Vision LLM for image understanding
image_prompt str None Custom prompt for vision model
file_parser_cls type[BaseBlobParser] None Custom parser for all attachments
file_parser_kwargs dict None kwargs for custom parser
credentials Credentials None Pre-built Google credentials
service_account_file str None Service account key JSON path
token_file str None Cached OAuth token path
client_secrets_file str None OAuth client secrets path
scopes list[str] Read-only API scopes to request

๐Ÿ—๏ธ Architecture

GoogleClassroomLoader (BaseLoader)
โ”œโ”€โ”€ _utilities.py         โ€” auth, retry/backoff, guard_import
โ”œโ”€โ”€ classroom_api.py      โ€” paginated Classroom API fetcher
โ”œโ”€โ”€ document_builder.py   โ€” raw API โ†’ LangChain Document
โ”œโ”€โ”€ drive_resolver.py     โ€” Drive download/export
โ”œโ”€โ”€ normalizer.py         โ€” text cleanup (Unicode NFC, whitespace)
โ””โ”€โ”€ parsers/
    โ”œโ”€โ”€ __init__.py       โ€” MIME registry + get_parser()
    โ”œโ”€โ”€ pdf_parser.py     โ€” pypdf + vision LLM
    โ”œโ”€โ”€ docx_parser.py    โ€” python-docx
    โ”œโ”€โ”€ text_parser.py    โ€” built-in UTF-8
    โ””โ”€โ”€ image_parser.py   โ€” vision LLM + base64 fallback

๐Ÿงช Development

# Clone and install
git clone https://github.com/ayanokojix21/langchain-google-classroom.git
cd langchain-google-classroom
python -m pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Format + lint
ruff format .
ruff check .

๐Ÿ“ License

MIT โ€” see LICENSE for details.

See CONTRIBUTING.md for development guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_google_classroom-0.1.2.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_google_classroom-0.1.2-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file langchain_google_classroom-0.1.2.tar.gz.

File metadata

File hashes

Hashes for langchain_google_classroom-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6733166bfb5ecaf73afbe4763216bba81e26bb47bbdb30805c42031f186ec1be
MD5 74dbf7992292e16dd54b93e59fb713e9
BLAKE2b-256 54922c79ab8c7b2d4cf01a0bd501f9b63cd2dd2d101951900d2f0229a1a6f0dd

See more details on using hashes here.

File details

Details for the file langchain_google_classroom-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_google_classroom-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 47cf74af66c5b9a676140afb4bcff472dbd26a76e2bf16c698531353e837362d
MD5 25a05ca261b4a6f988350644a4ac6948
BLAKE2b-256 5c6059e5ad7da9895a9e1c9bfc27f4c95516920a418932f6efa51ab571f4da55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page