Granularity-on-demand learning object extractor and composer
Project description
Granule (v0.1.1)
Granule ingests blogs/HN/Reddit/YouTube/podcasts/news, atomizes them into SemanticAtoms with citations, and composes MicroLearning/FocusedSessions/DeepMastery units by target duration.
Install
pip install -e .
CLI
granule ingest "Some article text" --kind blog -o doc.json
granule dissect doc.json --max-tokens 60 --only definition,example -o atoms.json
granule expand atoms.json --target 55s -o micro.json
granule stream atoms.json --pace 55s --until 3m -o session.jsonl
granule simple-youtube https://www.youtube.com/watch?v=dQw4w9WgXcQ -o simple.json
granule simple-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o card.json
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o video_card.json
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ --title "Custom Title" -o titled_card.json
granule text-video-card transcript.txt -o text_card.json
granule text-video-card "[00:00] Intro\n[00:05] Point A" --title "Inline Snippet" -o inline_card.json
FastAPI
uvicorn granule.fastapi_app:app --reload --port 8000
Endpoints:
- POST /ingest {source, kind}
- POST /dissect {doc, max_tokens, kinds}
- POST /expand {graph, target_seconds}
YouTube transcript
from granule.ingest.youtube import ingest_youtube
doc = ingest_youtube("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(doc.text[:200])
Environment Variables
Granule reads optional environment variables for LLM integration.
- Copy
.env.exampleto.env(or export variables another way). - Add your OpenAI key if you want LLM-powered features (planned / experimental):
cp .env.example .env
echo "OPENAI_API_KEY=sk-..." >> .env
Variables:
-
OPENAI_API_KEY– enables future atom enrichment / generation via OpenAI. -
GRANULE_LLM_MODEL– preferred model (e.g.gpt5). -
GRANULE_LLM_PROVIDER– provider alias (openai,azure-openai, etc.). -
GRANULE_SUPPRESS_PROBLEM_QUESTIONS: if set to1/true/on, skips adding ontology-derived and synthesized integrative Questions Raised (useful when you only want claims & problems without extra exploratory questions).
If python-dotenv is installed, .env will be auto-loaded; otherwise export vars normally.
Streamlit UI (YouTube → Atoms → Unit)
Experimental helper UI.
Install extras:
pip install -e .[ui,llm]
Run:
streamlit run streamlit_app.py
Paste a YouTube URL, adjust parameters, run the pipeline, optionally enrich first atoms (needs OPENAI_API_KEY).
New: Segments, Summaries & Insight Card
The pipeline now also produces:
segments– improved sentence-cluster segments with token counts.segment_summaries– per-segment micro summaries (LLM-backed if key present, heuristic fallback otherwise).insight_card– a heuristic high-level Transcript Insight Card (claims, glossary, metrics stubs) serialized for UI consumption.
These appear in the composite JSON returned by process_youtube and in the Streamlit UI (preview table + card header/sections).
Video Insight Card (Structured JSON)
Granule can produce a richer, schema-driven VideoInsightCardPayload either from a YouTube URL (auto transcript + optional title fetch) or any raw transcript text file.
Quickstart:
# YouTube → structured video insight card (minimal fallback if no OPENAI_API_KEY set)
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o video_card.json
# Override title (skip auto fetched oEmbed title)
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ --title "My Custom Title" -o custom.json
# Raw transcript file → card
granule text-video-card path/to/transcript.txt --title "Workshop Transcript" -o workshop_card.json
# Inline raw text (small snippets) → card
granule text-video-card "[00:00] Intro to X\n[00:10] Key idea" --title "Snippet" -o snippet_card.json
If you provide an OPENAI_API_KEY, the card is generated via the OpenAI Responses API with strict schema parsing; otherwise a minimal fallback card (header + TL;DR snippet) is produced.
Schema highlights:
- Header (title, subtitle, badges)
- Video meta (url, id)
- Sections (TL;DR, Claims & Evidence, Glossary, Rhetoric, Misconceptions, Questions, Segments, Metrics, etc. – only those with content appear)
- Footer (persuasion modes, devices, timeline events)
Use cases:
- Fast analysis / structuring of transcripts for research
- Feeding downstream UI components or analytics pipelines
- Offline transcript auditing (supply a .txt file without hitting YouTube)
Tip: Pair with simple-card for a lighter heuristic card, or video-card for the full structured extraction.
Releasing / Publishing
Helper script publish.sh automates version bump, build, and upload.
Examples:
# Bump patch version, build, upload to PyPI
./publish.sh patch
# Bump minor and upload to TestPyPI
./publish.sh minor --test
# Build only (no version change, no upload)
./publish.sh same --no-upload
# Dry run (show actions without changing files)
./publish.sh patch --dry-run
Set PYPI_TOKEN env var for non-interactive upload (token from PyPI account settings).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file granule-0.1.6.tar.gz.
File metadata
- Download URL: granule-0.1.6.tar.gz
- Upload date:
- Size: 46.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c959640d16f4c9c6ac1d6e2578e3f9d7a209b471ff3349c111a2db792de0f98
|
|
| MD5 |
c4b8bef6682e22d978c592396e3c37c1
|
|
| BLAKE2b-256 |
2dbda0b1059d8e18d26cd189ddba1357337bdcabe8c0fece6f4bdfed3595fe9d
|
File details
Details for the file granule-0.1.6-py3-none-any.whl.
File metadata
- Download URL: granule-0.1.6-py3-none-any.whl
- Upload date:
- Size: 49.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbca0734757046413f3ed2a1700c68a03b6be7bd00f6f958659bec0d580353b1
|
|
| MD5 |
5defac406046159a897f634c47f7402f
|
|
| BLAKE2b-256 |
c57c927cf55b12b89c0968f477b3173ccc855d5d57da7994948cd0f049c5da57
|