Convert YouTube videos, playlists, and web pages into Obsidian markdown notes using LLMs

These details have not been verified by PyPI

Project links

Project description

notegen

Convert YouTube videos, playlists, and web pages into rich Obsidian-flavored markdown notes using LLMs. Supports English, Hindi, and Malayalam YouTube videos — all notes generated in English.

Install

pip install notegen

Quick start

# 1. Run the setup wizard (choose provider, add API key, verify connection)
notegen setup

# 2. Generate notes
notegen https://youtube.com/watch?v=...

Usage

# Auto-detect source type (bare URL or file)
notegen https://youtube.com/watch?v=...
notegen https://youtube.com/playlist?list=...
notegen https://example.com/article
notegen transcript.txt

# Explicit commands
notegen video <youtube-url>
notegen playlist <playlist-url> [--force] [--force-restart]
notegen web <url>
notegen text <file-or-stdin>
notegen text -                          # stdin

# First-time setup
notegen setup                           # guided wizard

# Watch a folder — auto-process new .txt/.md files
notegen watch ./inbox

# Dry run — estimate tokens/cost without calling LLM
notegen -n https://youtube.com/watch?v=...
notegen video <url> --dry-run

# Output format
notegen video <url> --format logseq
notegen web <url> --format plain

# Extra prompt guidance
notegen video <url> --prompt "Focus on practical setup steps and specific tool names"
notegen web <url> -p "Include comparisons to alternatives for every tool mentioned"

# Config
notegen config init      # create config file
notegen config open      # open config in your default editor
notegen config show      # print resolved config
notegen config validate  # check structure + API key presence
notegen doctor           # config check + real test API call

# Cache
notegen cache clear      # remove ~/.cache/notegen/

Options

Flag	Description
`-o / --output-dir PATH`	Override output directory
`-m / --model TEXT`	LiteLLM model string (e.g. `groq/llama-3.3-70b-versatile`)
`-v / --verbose`	Show chunk count, token usage, model/key selection, crawl status
`--no-mermaid`	Disable mermaid diagram generation
`--no-cache`	Skip cache read/write for this run
`-n / --dry-run`	Print token/cost estimate; skip LLM call
`--format TEXT`	Output format: `obsidian` (default) · `logseq` · `plain` · `roam`
`-p / --prompt TEXT`	Extra instructions appended to the LLM prompt
`--force`	Skip playlist videos without captions instead of aborting
`--force-restart`	Ignore playlist resume file, reprocess all videos

Config file

Location

OS	Path
Linux	`~/.config/notes-gen/config.yaml`
macOS	`~/.config/notes-gen/config.yaml`
Windows	`%USERPROFILE%\.config\notes-gen\config.yaml`

Run notegen setup (recommended) for guided configuration, or notegen config init to generate a fully-commented template then notegen config open to edit it.

Full reference (`~/.config/notes-gen/config.yaml`)

# Active model — format: <provider>/<model-name>
model: anthropic/claude-sonnet-4-6

# Output
output_dir: ~/notes
mermaid: true

# Output format: obsidian (default) | logseq | plain | roam
output_format: obsidian

# Caching — transcripts + LLM output cached in ~/.cache/notegen/
# Set to false to always re-fetch and re-generate
cache: true

# Token budget — compress output if it exceeds this many tokens (0 = no limit)
max_output_tokens: 0

# Fuzzy dedup — skip near-duplicate sections in merged notes (Jaccard threshold)
merger_similarity_threshold: 0.7

# API key rotation — add multiple keys per provider.
# notegen picks one at random each request (useful for free-tier rate limits).
api_keys:
  anthropic:
    - sk-ant-api03-KEY1
    - sk-ant-api03-KEY2   # second key rotated in automatically
  groq:
    - gsk_KEY1
  openai:
    - sk-proj-KEY1
  gemini:
    - AIzaSyKEY1
  nvidia_nim:
    - nvapi-KEY1
  mistral:
    - KEY1
  cohere:
    - KEY1
  together_ai:
    - KEY1
  deepseek:
    - sk-KEY1
  perplexity:
    - pplx-KEY1
  xai:
    - xai-KEY1

# Web crawl limits
max_concurrent: 5
web_max_pages: 50
web_max_depth: 3

# Rate limiting & retry (important for free-tier providers like Groq, Gemini)
max_retries: 5
retry_base_delay: 60.0   # seconds; backoff = base * 2^attempt

Supported providers

Provider	Model string example
Anthropic	`anthropic/claude-sonnet-4-6`
OpenAI	`openai/gpt-4o`
Groq	`groq/llama-3.3-70b-versatile`
Google Gemini	`gemini/gemini-2.0-flash`
NVIDIA NIM (free tier)	`nvidia_nim/meta/llama-3.1-70b-instruct`
Mistral	`mistral/mistral-large-latest`
Cohere	`cohere/command-r-plus`
Together AI	`together_ai/meta-llama/Llama-3-70b-chat-hf`
DeepSeek	`deepseek/deepseek-chat`
Perplexity	`perplexity/sonar-pro`
xAI (Grok)	`xai/grok-2`
Ollama (local)	`ollama/llama3`

Any provider supported by LiteLLM works.

Env var API keys

As an alternative to the config file, set NOTEGEN_<PROVIDER>_KEY env vars. These are used as fallback when no keys are configured for a provider:

export NOTEGEN_GROQ_KEY=gsk_...
export NOTEGEN_ANTHROPIC_KEY=sk-ant-...
export NOTEGEN_GEMINI_KEY=AIzaSy...

Config keys take priority over env vars. Env vars are useful for CI or server use.

Caching

Transcripts and LLM-generated notes are cached in ~/.cache/notegen/ (keyed on URL + model). Re-running the same source skips fetch and LLM calls entirely.

notegen video <url>          # first run: fetches + generates + caches
notegen video <url>          # second run: serves from cache instantly
notegen video <url> --no-cache   # bypass cache for this run
notegen cache clear          # wipe all cached files

Dry run

Estimate tokens and cost before committing to a run:

notegen -n https://youtube.com/playlist?list=...

Prints a Rich table with chunk count, token count, estimated cost, and estimated generation time. No LLM calls are made, no files are written.

Output formats

Use --format to target different note-taking apps:

Format	Syntax style
`obsidian` (default)	`[[wikilinks]]`, `> [!TIP]` callouts, mermaid diagrams
`logseq`	Bullet-based, `#+BEGIN_TIP` blocks
`plain`	Clean markdown, no app-specific syntax
`roam`	`#[[hashtag refs]]`

Watch mode

Drop files into a folder and notegen auto-processes them:

notegen watch ./inbox --output-dir ./notes

Processes existing unprocessed .txt/.md files on startup
Watches for new files; processes each as it appears
Tracks processed files in .watch-state.json (won't reprocess on restart)
Ctrl+C exits cleanly

Playlist resume

Long playlists are resumable. Progress is saved to .progress.json in the output folder after each video. If a run is interrupted, re-running the same command skips already-completed videos.

notegen playlist <url>            # resumes from where it left off
notegen playlist <url> --force-restart   # ignore progress, reprocess all

Rate limiting

Free-tier providers (Groq, Gemini, Together AI, etc.) enforce strict TPM/RPM limits. notegen handles 429 errors automatically:

Cools down the offending key and rotates to another available key immediately.
If all keys for the provider are exhausted, waits using the Retry-After header value (if present) or exponential backoff (retry_base_delay * 2^attempt), then retries.

With the defaults (max_retries: 5, retry_base_delay: 60), the wait sequence is 60s → 120s → 240s → 480s → 960s. Adding multiple API keys from different free accounts is the most effective way to stay under limits.

Output format

Obsidian-flavored markdown (default):

YAML frontmatter (title, source, type, tags, date)
# → ## → ### heading hierarchy following source narrative
___ horizontal dividers between major topic shifts
> blockquote for quotes, definitions, memorable statements
$formula$ inline LaTeX · $$formula$$ block equations
Callouts — used where content genuinely warrants them: > [!TIP] > [!NOTE] > [!INFO] > [!WARNING] > [!CAUTION] > [!DANGER] > [!EXAMPLE] > [!QUESTION] > [!QUOTE] > [!CITE]
Mermaid diagrams for system topology, data flows, architectures
Markdown tables for comparisons and feature matrices
[[wikilinks]] for cross-references to related concepts
**bold** key terms · _italic_ emphasis · `inline code` for commands/flags/values
Fenced code blocks with language tags
Tags auto-inferred by LLM from content
Playlist → folder + index.md with wikilinks to each video note

Language support

YouTube transcripts are fetched and processed in this order:

English — used directly (en, en-US, en-GB)
Hindi (hi) — auto-translated to English via YouTube's translation API
Malayalam (ml) — auto-translated to English via YouTube's translation API

Translation uses YouTube's own built-in API (no extra key or dependency needed). Notes are always written in English regardless of source language. Videos with no captions in any supported language fail with an error (or are skipped in playlists with --force).

Requirements

Python ≥ 3.11
API key for at least one supported LLM provider

Development

uv sync --dev        # install all deps including dev
uv run pytest        # run tests
uv run ruff check .  # lint
uv run ruff format . # format

CI

GitHub Actions runs on every push to main or sms/** branches and on pull requests to main.

Pipeline (.github/workflows/ci.yml):

Step	Command
Format check	`ruff format --check .`
Lint	`ruff check .`
Test	`pytest tests/ -q --tb=short`

Matrix: Python 3.11 and 3.12 on ubuntu-latest.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.4.1

May 25, 2026

2.3.1

May 24, 2026

This version

2.3.0

May 24, 2026

2.2.0

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notegen-2.3.0.tar.gz (44.0 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

notegen-2.3.0-py3-none-any.whl (36.6 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file notegen-2.3.0.tar.gz.

File metadata

Download URL: notegen-2.3.0.tar.gz
Upload date: May 24, 2026
Size: 44.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for notegen-2.3.0.tar.gz
Algorithm	Hash digest
SHA256	`bd730314e44cf5a3b2094bf026e3f8ae2d3624725d8547c119c9b482921ac957`
MD5	`084722a82760ba5ca82e146cc67ad1e6`
BLAKE2b-256	`4c88493019ff17877398be1da17cb416caa986b5c1ea41e7d584480e066143ae`

See more details on using hashes here.

File details

Details for the file notegen-2.3.0-py3-none-any.whl.

File metadata

Download URL: notegen-2.3.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 36.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for notegen-2.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c843a69262c37d3a28b63e30e6146c3164c3ecd96ec39c6604143e34d5286704`
MD5	`f9b5692bc01bc90fc2c17964fc9382d3`
BLAKE2b-256	`486ba100276fb740b55aadae9a7ebb2ec37d89f0cbff993f357fee2b98c1025d`

See more details on using hashes here.

notegen 2.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

notegen

Install

Quick start

Usage

Options

Config file

Location

Full reference (~/.config/notes-gen/config.yaml)

Supported providers

Env var API keys

Caching

Dry run

Output formats

Watch mode

Playlist resume

Rate limiting

Output format

Language support

Requirements

Development

CI

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Full reference (`~/.config/notes-gen/config.yaml`)