Skip to main content

Granularity-on-demand learning object extractor and composer

Project description

Granule (v0.1.1)

Granule ingests blogs/HN/Reddit/YouTube/podcasts/news, atomizes them into SemanticAtoms with citations, and composes MicroLearning/FocusedSessions/DeepMastery units by target duration.

Install

pip install -e .

Minimal Video Insight Card Install

If you only need to generate structured VideoInsightCardPayload objects (YouTube transcript + OpenAI LLM), install the lightweight extra:

pip install granule[videocard]

This pulls only:

  • pydantic (core models)
  • youtube-transcript-api (transcript fetch)
  • openai (LLM generation)
  • python-dotenv (optional .env loading)

Example:

from granule.api_simple import video_insight_card
from dotenv import load_dotenv
load_dotenv()  # reads OPENAI_API_KEY
card = video_insight_card("https://www.youtube.com/watch?v=mAClw7r3ETc")
print(card.model_dump_json(indent=2))

If OPENAI_API_KEY is unset, you'll still get a minimal fallback card (header + short TL;DR snippet if transcript exists).

Full Feature Install

For CLI, article ingestion, ML metrics, vector store, API, UI, and advanced agent support:

pip install granule[full]

CLI

granule ingest "Some article text" --kind blog -o doc.json
granule dissect doc.json --max-tokens 60 --only definition,example -o atoms.json
granule expand atoms.json --target 55s -o micro.json
granule stream atoms.json --pace 55s --until 3m -o session.jsonl
granule simple-youtube https://www.youtube.com/watch?v=dQw4w9WgXcQ -o simple.json
granule simple-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o card.json
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o video_card.json
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ --title "Custom Title" -o titled_card.json
granule text-video-card transcript.txt -o text_card.json
granule text-video-card "[00:00] Intro\n[00:05] Point A" --title "Inline Snippet" -o inline_card.json

FastAPI

uvicorn granule.fastapi_app:app --reload --port 8000

Endpoints:

  • POST /ingest {source, kind}
  • POST /dissect {doc, max_tokens, kinds}
  • POST /expand {graph, target_seconds}

YouTube transcript

from granule.ingest.youtube import ingest_youtube
doc = ingest_youtube("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(doc.text[:200])

Environment Variables

Granule reads optional environment variables for LLM integration.

  1. Copy .env.example to .env (or export variables another way).
  2. Add your OpenAI key if you want LLM-powered features (planned / experimental):
cp .env.example .env
echo "OPENAI_API_KEY=sk-..." >> .env

Variables:

  • OPENAI_API_KEY – enables future atom enrichment / generation via OpenAI.

  • GRANULE_LLM_MODEL – preferred model (e.g. gpt5).

  • GRANULE_LLM_PROVIDER – provider alias (openai, azure-openai, etc.).

  • GRANULE_SUPPRESS_PROBLEM_QUESTIONS: if set to 1/true/on, skips adding ontology-derived and synthesized integrative Questions Raised (useful when you only want claims & problems without extra exploratory questions).

If python-dotenv is installed, .env will be auto-loaded; otherwise export vars normally.

Streamlit UI (YouTube → Atoms → Unit)

Experimental helper UI.

Install extras:

pip install -e .[ui,llm]

Run:

streamlit run streamlit_app.py

Paste a YouTube URL, adjust parameters, run the pipeline, optionally enrich first atoms (needs OPENAI_API_KEY).

New: Segments, Summaries & Insight Card

The pipeline now also produces:

  • segments – improved sentence-cluster segments with token counts.
  • segment_summaries – per-segment micro summaries (LLM-backed if key present, heuristic fallback otherwise).
  • insight_card – a heuristic high-level Transcript Insight Card (claims, glossary, metrics stubs) serialized for UI consumption.

These appear in the composite JSON returned by process_youtube and in the Streamlit UI (preview table + card header/sections).

Video Insight Card (Structured JSON)

Granule can produce a richer, schema-driven VideoInsightCardPayload either from a YouTube URL (auto transcript + optional title fetch) or any raw transcript text file.

Quickstart:

# (Install)  pip install granule[videocard]
# YouTube → structured video insight card (minimal fallback if no OPENAI_API_KEY set)
granule --log-level DEBUG video-card https://www.youtube.com/watch?v=mAClw7r3ETc -o video_card.json

# Override title (skip auto fetched oEmbed title)
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ --title "My Custom Title" -o custom.json

# Raw transcript file → card
granule text-video-card path/to/transcript.txt --title "Workshop Transcript" -o workshop_card.json

# Inline raw text (small snippets) → card
granule text-video-card "[00:00] Intro to X\n[00:10] Key idea" --title "Snippet" -o snippet_card.json

If you provide an OPENAI_API_KEY, the card is generated via the OpenAI Responses API with strict schema parsing; otherwise a minimal fallback card (header + TL;DR snippet) is produced.

Schema highlights:

  • Header (title, subtitle, badges)
  • Video meta (url, id)
  • Sections (TL;DR, Claims & Evidence, Glossary, Rhetoric, Misconceptions, Questions, Segments, Metrics, etc. – only those with content appear)
  • Footer (persuasion modes, devices, timeline events)

Use cases:

  • Fast analysis / structuring of transcripts for research
  • Feeding downstream UI components or analytics pipelines
  • Offline transcript auditing (supply a .txt file without hitting YouTube)

Tip: Pair with simple-card for a lighter heuristic card, or video-card for the full structured extraction.

Extras Overview

Extra Purpose
videocard Minimal OpenAI-powered video insight card generation
cli Typer/Rich command line UX
article Blog/article/markdown parsing & readability
analysis Optional numeric/text metrics (numpy, scikit-learn)
vector Chroma vector store integration
web FastAPI + Uvicorn API server
ui Streamlit prototype UI
advanced-llm pydantic-ai agent experimentation
full Aggregate of all feature extras

Releasing / Publishing

Helper script publish.sh automates version bump, build, and upload.

Examples:

# Bump patch version, build, upload to PyPI
./publish.sh patch

# Bump minor and upload to TestPyPI
./publish.sh minor --test

# Build only (no version change, no upload)
./publish.sh same --no-upload

# Dry run (show actions without changing files)
./publish.sh patch --dry-run

Set PYPI_TOKEN env var for non-interactive upload (token from PyPI account settings).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

granule-0.1.12.tar.gz (53.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

granule-0.1.12-py3-none-any.whl (53.1 kB view details)

Uploaded Python 3

File details

Details for the file granule-0.1.12.tar.gz.

File metadata

  • Download URL: granule-0.1.12.tar.gz
  • Upload date:
  • Size: 53.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for granule-0.1.12.tar.gz
Algorithm Hash digest
SHA256 d1c9d797985b8844999268b4a72d56fd5b2cda9073deec7db3b7209410c16c47
MD5 9282d6c76e1490c03da326ca85cebf10
BLAKE2b-256 d34ec22a32e0debbef7f6eb6d65f4db4832245b62998a62e505c8a81ab2bb2ec

See more details on using hashes here.

File details

Details for the file granule-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: granule-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 53.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for granule-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 348c37f0a259c7cbe9ca6cde8b251a7d3a29960e3e0d7273687f6398f4d3648e
MD5 4d90acd16a208806896b882d345a5c03
BLAKE2b-256 93bf7dede864af9d7e647c664a779ebce93cb0c5e94b87045010c2dc0f91a08c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page