Convert YouTube videos, playlists, and web pages into Obsidian markdown notes using LLMs
Project description
notegen
Convert YouTube videos, playlists, and web pages into rich Obsidian-flavored markdown notes using LLMs. Supports English, Hindi, and Malayalam YouTube videos — all notes generated in English.
Install
pip install notegen
Quick start
# 1. Run the setup wizard (choose provider, add API key, verify connection)
notegen setup
# 2. Generate notes
notegen https://youtube.com/watch?v=...
Usage
# Auto-detect source type (bare URL or file)
notegen https://youtube.com/watch?v=...
notegen https://youtube.com/playlist?list=...
notegen https://example.com/article
notegen transcript.txt
# Explicit commands
notegen video <youtube-url>
notegen playlist <playlist-url> [--force] [--force-restart]
notegen web <url>
notegen text <file-or-stdin>
notegen text - # stdin
# First-time setup
notegen setup # guided wizard
# Watch a folder — auto-process new .txt/.md files
notegen watch ./inbox
# Dry run — estimate tokens/cost without calling LLM
notegen -n https://youtube.com/watch?v=...
notegen video <url> --dry-run
# Output format
notegen video <url> --format logseq
notegen web <url> --format plain
# Extra prompt guidance
notegen video <url> --prompt "Focus on practical setup steps and specific tool names"
notegen web <url> -p "Include comparisons to alternatives for every tool mentioned"
# Config
notegen config init # create config file
notegen config open # open config in your default editor
notegen config show # print resolved config
notegen config validate # check structure + API key presence
notegen doctor # config check + real test API call
# Cache
notegen cache clear # remove ~/.cache/notegen/
Options
| Flag | Description |
|---|---|
--version |
Print version and exit |
-o / --output-dir PATH |
Override output directory |
-m / --model TEXT |
LiteLLM model string (e.g. groq/llama-3.3-70b-versatile) |
-v / --verbose |
Show chunk count, token usage, model/key selection, crawl status |
--no-mermaid |
Disable mermaid diagram generation |
--no-cache |
Skip cache read/write for this run |
-n / --dry-run |
Print token/cost estimate; skip LLM call |
--format TEXT |
Output format: obsidian (default) · logseq · plain · roam |
-p / --prompt TEXT |
Extra instructions appended to the LLM prompt |
--force |
Skip playlist videos without captions instead of aborting |
--force-restart |
Ignore playlist resume file, reprocess all videos |
Config file
Location
| OS | Path |
|---|---|
| Linux | ~/.config/notes-gen/config.yaml |
| macOS | ~/.config/notes-gen/config.yaml |
| Windows | %USERPROFILE%\.config\notes-gen\config.yaml |
Run notegen setup (recommended) for guided configuration, or notegen config init to generate a fully-commented template then notegen config open to edit it.
Full reference (~/.config/notes-gen/config.yaml)
# Active model — format: <provider>/<model-name>
model: anthropic/claude-sonnet-4-6
# Output
output_dir: ~/notes
mermaid: true
# Output format: obsidian (default) | logseq | plain | roam
output_format: obsidian
# Caching — transcripts + LLM output cached in ~/.cache/notegen/
# Set to false to always re-fetch and re-generate
cache: true
# Token budget — compress output if it exceeds this many tokens (0 = no limit)
max_output_tokens: 0
# Fuzzy dedup — skip near-duplicate sections in merged notes (Jaccard threshold)
merger_similarity_threshold: 0.7
# API key rotation — add multiple keys per provider.
# notegen picks one at random each request (useful for free-tier rate limits).
api_keys:
anthropic:
- sk-ant-api03-KEY1
- sk-ant-api03-KEY2 # second key rotated in automatically
groq:
- gsk_KEY1
openai:
- sk-proj-KEY1
gemini:
- AIzaSyKEY1
nvidia_nim:
- nvapi-KEY1
mistral:
- KEY1
cohere:
- KEY1
together_ai:
- KEY1
deepseek:
- sk-KEY1
perplexity:
- pplx-KEY1
xai:
- xai-KEY1
# Web crawl limits
max_concurrent: 5
web_max_pages: 50
web_max_depth: 3
# Rate limiting & retry (important for free-tier providers like Groq, Gemini)
max_retries: 5
retry_base_delay: 60.0 # seconds; backoff = base * 2^attempt
Supported providers
| Provider | Model string example |
|---|---|
| Anthropic | anthropic/claude-sonnet-4-6 |
| OpenAI | openai/gpt-4o |
| Groq | groq/llama-3.3-70b-versatile |
| Google Gemini | gemini/gemini-2.0-flash |
| NVIDIA NIM (free tier) | nvidia_nim/meta/llama-3.1-70b-instruct |
| Mistral | mistral/mistral-large-latest |
| Cohere | cohere/command-r-plus |
| Together AI | together_ai/meta-llama/Llama-3-70b-chat-hf |
| DeepSeek | deepseek/deepseek-chat |
| Perplexity | perplexity/sonar-pro |
| xAI (Grok) | xai/grok-2 |
| Ollama (local) | ollama/llama3 |
Any provider supported by LiteLLM works.
Env var API keys
As an alternative to the config file, set NOTEGEN_<PROVIDER>_KEY env vars. These are used as fallback when no keys are configured for a provider:
export NOTEGEN_GROQ_KEY=gsk_...
export NOTEGEN_ANTHROPIC_KEY=sk-ant-...
export NOTEGEN_GEMINI_KEY=AIzaSy...
Config keys take priority over env vars. Env vars are useful for CI or server use.
Caching
Transcripts and LLM-generated notes are cached in ~/.cache/notegen/ (keyed on URL + model). Re-running the same source skips fetch and LLM calls entirely.
notegen video <url> # first run: fetches + generates + caches
notegen video <url> # second run: serves from cache instantly
notegen video <url> --no-cache # bypass cache for this run
notegen cache clear # wipe all cached files
Dry run
Estimate tokens and cost before committing to a run:
notegen -n https://youtube.com/playlist?list=...
Prints a Rich table with chunk count, token count, estimated cost, and estimated generation time. No LLM calls are made, no files are written.
Output formats
Use --format to target different note-taking apps:
| Format | Syntax style |
|---|---|
obsidian (default) |
[[wikilinks]], > [!TIP] callouts, mermaid diagrams |
logseq |
Bullet-based, #+BEGIN_TIP blocks |
plain |
Clean markdown, no app-specific syntax |
roam |
#[[hashtag refs]] |
Watch mode
Drop files into a folder and notegen auto-processes them:
notegen watch ./inbox --output-dir ./notes
- Processes existing unprocessed
.txt/.mdfiles on startup - Watches for new files; processes each as it appears
- Tracks processed files in
.watch-state.json(won't reprocess on restart) - Ctrl+C exits cleanly
Playlist resume
Long playlists are resumable. Progress is saved to .progress.json in the output folder after each video. If a run is interrupted, re-running the same command skips already-completed videos.
notegen playlist <url> # resumes from where it left off
notegen playlist <url> --force-restart # ignore progress, reprocess all
Rate limiting
Free-tier providers (Groq, Gemini, Together AI, etc.) enforce strict TPM/RPM limits. notegen handles 429 errors automatically:
- Cools down the offending key and rotates to another available key immediately.
- If all keys for the provider are exhausted, waits using the
Retry-Afterheader value (if present) or exponential backoff (retry_base_delay * 2^attempt), then retries.
With the defaults (max_retries: 5, retry_base_delay: 60), the wait sequence is 60s → 120s → 240s → 480s → 960s. Adding multiple API keys from different free accounts is the most effective way to stay under limits.
Output format
Obsidian-flavored markdown (default):
- YAML frontmatter (
title,source,type,tags,date) #→##→###heading hierarchy following source narrative___horizontal dividers between major topic shifts> blockquotefor quotes, definitions, memorable statements$formula$inline LaTeX ·$$formula$$block equations- Callouts — used where content genuinely warrants them:
> [!TIP]> [!NOTE]> [!INFO]> [!WARNING]> [!CAUTION]> [!DANGER]> [!EXAMPLE]> [!QUESTION]> [!QUOTE]> [!CITE] - Mermaid diagrams for system topology, data flows, architectures
- Markdown tables for comparisons and feature matrices
[[wikilinks]]for cross-references to related concepts**bold**key terms ·_italic_emphasis ·`inline code`for commands/flags/values- Fenced code blocks with language tags
- Tags auto-inferred by LLM from content
- Playlist → folder +
index.mdwith wikilinks to each video note
Language support
YouTube transcripts are fetched and processed in this order:
- English — used directly (
en,en-US,en-GB) - Hindi (
hi) — auto-translated to English via YouTube's translation API - Malayalam (
ml) — auto-translated to English via YouTube's translation API
Translation uses YouTube's own built-in API (no extra key or dependency needed). Notes are always written in English regardless of source language. Videos with no captions in any supported language fail with an error (or are skipped in playlists with --force).
Requirements
- Python ≥ 3.11
- API key for at least one supported LLM provider
Development
uv sync --dev # install all deps including dev
uv run pytest # run tests
uv run ruff check . # lint
uv run ruff format . # format
CI
GitHub Actions runs on every push to main or sms/** branches and on pull requests to main.
Pipeline (.github/workflows/ci.yml):
| Step | Command |
|---|---|
| Format check | ruff format --check . |
| Lint | ruff check . |
| Test | pytest tests/ -q --tb=short |
Matrix: Python 3.11 and 3.12 on ubuntu-latest.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file notegen-2.3.1.tar.gz.
File metadata
- Download URL: notegen-2.3.1.tar.gz
- Upload date:
- Size: 44.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8c4842536f813ed12829731350ec62cc99b0742543a2b3deeedfe3dade02ce6
|
|
| MD5 |
422ff9f9a72f84723fdfd8d34721fd00
|
|
| BLAKE2b-256 |
b1c4fafa7a06cf38b092b305431cb48b61879b9368df3c647ed1701b0b5a6f75
|
File details
Details for the file notegen-2.3.1-py3-none-any.whl.
File metadata
- Download URL: notegen-2.3.1-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25c025d34c30af8319d0cea83c32511401c9b9d09898014053073c00861a795a
|
|
| MD5 |
ab36d69f302d966fe76e7abd1682ea82
|
|
| BLAKE2b-256 |
2381d6efd8e2ffcb829feac194929ab0f9c1d80efe45c172c3ff3a53cbda340b
|