Local CLI for pulling Telegram chats and channels, transcribing voice/video via OpenAI, and analyzing with Chat Completions.
Project description
unread
Read less, know more. A local CLI that pulls Telegram chats, YouTube videos, web articles, and local files into one searchable archive — and analyzes, queries, or dumps them through the AI provider you choose.
unread collapses the "I have too much to read" problem into three verbs:
unread <ref>— analyze: produce a structured Markdown report (digest, action items, decisions, …) with clickable citations back to every source message.unread ask <ref> "Q"— ask: get a single-shot answer with citations from a chat archive, a video, an article, or a file.unread dump <ref>— dump: save the source verbatim (the original file, the chat history, the transcript) to~/.unread/reports/.
<ref> is the same set of shapes for all three commands: a Telegram
handle / link, a YouTube URL, a web URL, a local file, or stdin.
unread @somegroup --last-days 7 # weekly digest of a Telegram group
unread "https://youtu.be/jmzoJCn8evU" # summarize a video
unread "https://paulgraham.com/greatwork" # summarize an article
unread ./meeting.mp3 # transcribe + analyze a recording
unread ask "what did Bob decide?" @somegroup
unread dump @somegroup -o history.md --last-days 30
Why unread
- One CLI, every source. Telegram chats / topics / channels, YouTube, websites, PDFs, DOCX, audio (Whisper), video (audio extracted then transcribed), images (vision), and stdin — same flags, same caches, same report shape.
- Bring your own model. OpenAI, Anthropic (Claude), Google (Gemini), OpenRouter, or a local OpenAI-compatible server (Ollama, LM Studio, vLLM). Switch at any time with
unread settings. - Local-first. Everything lives in
~/.unread/(SQLite for chats, embeddings, analysis cache, secrets). The only network calls are to Telegram, your AI provider, and any URLs you point at. - Cost-aware. Per-call token + USD accounting, a
--max-costcap, two-layer caching (local content cache + provider prompt cache), andunread statsfor spend reports. Re-running an unchanged chat is free; follow-ups are cheap. - Citation-grounded. Every claim in a report links back to the message / paragraph / timestamp that supports it.
--cite-contextadds an audit block under each citation so you can verify without leaving the terminal.
Quickstart (60 seconds)
# 1. Install (Python 3.11+; uv handles the venv and binary)
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
uv tool install unread
# 2. Set up — interactive wizard: install folder, AI provider key, optional Telegram login
unread init
# 3. Run something
unread "https://paulgraham.com/greatwork.html" # any web page
unread @somegroup --last-days 7 # last week of a chat
unread ask "what did Bob decide?" @somegroup # Q&A
unread doctor # verify the install
That's it. No virtualenv to activate, no system packages to manage,
no pip conflicts. uv tool install keeps unread and its deps in
their own isolated environment.
Documentation
| Topic | Section |
|---|---|
| All command flags + defaults | unread <ref> flags, ask, dump |
| Where files live | Where does unread read config and write data? |
| Credential storage | Security |
| Telegram chat references | Chat references |
| YouTube + web pages | YouTube videos, Web pages |
| Forum topics | Forum chats (topics) |
| Voice / image / link enrichment | Media enrichment |
| Output presets | Presets |
| Languages | Language |
| Cost & caching | Cost & caching |
| Backups, cleanup, watch | Maintenance |
config.toml reference |
Configuration (config.toml) |
| Debugging | Troubleshooting |
| Contributing | Development |
Installation
One line, all platforms
uv tool install unread
uv provisions a private Python ≥ 3.11 environment for unread,
isolated from your system Python and any other tools. Don't have uv?
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Verify with unread --version. Upgrade later with uv tool upgrade unread.
What works without extra dependencies
The default install handles text, PDF, DOCX, images, web pages, and
YouTube captions out of the box (extractors are bundled). The only
optional system dependency is ffmpeg — and only if you want audio
or video transcription (voice messages, video notes, podcasts,
recordings):
| OS | Install ffmpeg |
|---|---|
| macOS | brew install ffmpeg |
| Debian / Ubuntu | sudo apt install ffmpeg |
| Fedora / RHEL | sudo dnf install ffmpeg |
| Arch | sudo pacman -S ffmpeg |
| Windows (Scoop) | scoop install ffmpeg |
| Windows (Chocolatey) | choco install ffmpeg |
Without ffmpeg, audio/video paths skip with a clear warning instead
of crashing — the rest of unread keeps working.
Other install methods
# Bleeding-edge unreleased commits from GitHub:
uv tool install git+https://github.com/maxbolgarin/unread.git
# Editable / development install (source-linked, edits picked up live):
git clone https://github.com/maxbolgarin/unread.git
cd unread
uv sync --extra dev
uv tool install --editable .
# No global install — run from a cloned dir without putting anything on PATH:
uv run unread @somegroup
After install, unread doctor verifies the binary, dependencies, and
the install layout.
Tested on macOS and Linux. Windows works for the non-Telegram paths (files, YouTube, websites) — Telegram itself is supported via Telethon but signal handling and a few file-path edge cases may differ. Please file issues at https://github.com/maxbolgarin/unread/issues.
First-run setup
unread init
Four-step interactive wizard:
-
Install folder —
~/.unread/(default), current directory, or a custom path. Recorded at~/.unread/install.toml. -
AI provider — pick one and paste its key:
- openai (default) — also backs Whisper, embeddings, and vision
used for media enrichment (
--enrich=voice/videonote/video/image) andask --semantic. - anthropic (Claude), google (Gemini), openrouter (many models, one key), or local (Ollama / LM Studio / vLLM).
Press Enter to skip —
dump,describe, andsyncwork without any key; onlyanalyzeandaskneed one.Capability gaps. Whisper / embeddings / vision are OpenAI-only. Pick Anthropic / Google / OpenRouter / local as your chat provider and you can still add an OpenAI key alongside for those features. Without one, they skip with a clear warning.
- openai (default) — also backs Whisper, embeddings, and vision
used for media enrichment (
-
Telegram (optional) —
api_id/api_hashfrom https://my.telegram.org → API development tools, then phone + code login. Skip if you only want YouTube / web / file analysis. -
Done. Credentials persist in
~/.unread/storage/data.sqlite::secrets. Re-rununread initlater to fill in any step you skipped; only unsatisfied steps re-prompt.
Non-interactive setup (CI, scripts) — pre-populate ~/.unread/.env:
TELEGRAM_API_ID=1234567
TELEGRAM_API_HASH=abcdef0123456789abcdef0123456789
OPENAI_API_KEY=sk-...
Then unread init skips wizard prompts and runs Telethon auth only.
.env values always win over anything in the secrets DB, so key
rotation is a one-line edit.
unread doctor verifies the install at any time. unread tg login --force
re-runs Telethon auth without touching keys.
Migrating from a cwd-relative install (older versions wrote into the working directory)? Move
./.env,./config.toml,./storage/, and./reports/into~/.unread/manually — or setUNREAD_HOME=$(pwd)to keep the cwd-relative layout.
Where does unread read config and write data?
Everything lives under ~/.unread/:
~/.unread/.env ← credentials (filled in by you)
~/.unread/config.toml ← models, pricing, tuning
~/.unread/storage/session.sqlite ← Telegram session
~/.unread/storage/data.sqlite ← chats, messages, cache, embeddings
~/.unread/storage/backups/ ← snapshots from `unread backup`
~/.unread/reports/{chat}[/{topic}]/analyze/{preset}-{stamp}.md ← default report path
~/.unread/reports/{chat}/dump/dump-{stamp}.md ← default dump path
You can run unread from any directory — paths don't depend on cwd.
Override the install root for tests / multi-profile setups via the
UNREAD_HOME env var (e.g. UNREAD_HOME=/srv/unread-shared).
Security
unread stores three classes of high-value data on disk: API keys
(OpenAI, Anthropic, Google, OpenRouter, Telegram api_id/api_hash), the
Telegram session (full-account auth — anyone with this file can log
in as you), and the cached chat content (messages, transcripts,
analysis reports). The defenses below address the realistic threats:
other users on the same machine, backup leakage (Time Machine /
iCloud / Dropbox / NAS), stolen disks without FDE, and Telegram
session theft.
File permissions (always on)
~/.unread/ is created mode 0o700 and every file written inside
(data.sqlite, session.sqlite, reports, .env) is tightened to
0o600 immediately after creation. The media/ and runtime cache
directories are also 0o700. Verify with unread doctor — it flags
overpermissive modes, warns when the install lives under a known
cloud-sync folder (iCloud Drive, Dropbox, OneDrive, Google Drive),
and reports FileVault / LUKS state.
unread's structured logger has an API-key redactor: anything
matching sk-…, sk-ant-…, sk-or-…, AIza…, gsk_…, or known
secret-shaped event-dict keys (api_key, api_hash, passphrase,
session_string, auth_key, …) gets masked before rendering. So
even if a debug session is shared, raw credentials don't leak.
Three storage backends — unread security
The credential-storage backend is one-shot switchable:
unread security status # active backend, slot inventory, FDE check
unread security set plain # plaintext on disk (default)
unread security set keystore # OS keychain (recommended)
unread security set pass # passphrase-encrypted (strongest)
unread security set plain # … and back, any direction
| Backend | Storage | Defends against | UX |
|---|---|---|---|
plain |
~/.unread/storage/data.sqlite (plaintext) |
Other local users (via 0o700/0o600) |
Zero friction |
keystore |
macOS Keychain / Linux Secret Service / Windows Credential Manager | Other local users + backup leakage (Keychain isn't backed up) | Zero friction — unlocked with your login |
pass |
Same DB, but every value encrypted with a key derived from your passphrase. The Telegram session moves into an encrypted Telethon StringSession and the on-disk session.sqlite[.session] file disappears entirely. |
All of the above + stolen disk without FDE + VPS host operator + Telegram session theft from a backup | Passphrase prompt; cache it for the shell session via unread security unlock |
keystore — the recommended default for personal machines
unread security set keystore migrates every saved API key into the
OS-native keychain. Verify on macOS with:
security find-generic-password -s unread -a openai.api_key
sqlite3 ~/.unread/storage/data.sqlite "SELECT key, length(value) FROM secrets"
# → DB rows are blank; values live in Keychain under service "unread"
No passphrase is ever asked — the keychain is unlocked when you log
in. Keychain content is encrypted at rest with a key bound to your
user account. Backups (Time Machine, iCloud) by default exclude the
Keychain database, so a leaked Time Machine snapshot of ~/.unread/
no longer contains your API keys. On Linux, keystore requires a
running Secret Service (gnome-keyring / KWallet); on headless
hosts the wizard skips this offer silently and you stay on plain.
pass — passphrase-encrypted, strongest at-rest guarantee
unread security set pass runs an interactive prompt: pick a
passphrase, the CLI runs Scrypt (n=2¹⁷, ~100 ms) to derive a key,
and re-encrypts every secret value plus the Telegram session string
under ChaCha20Poly1305. The plaintext session.sqlite[.session]
file is removed at the end — there's nothing on disk an attacker
can copy to impersonate you on Telegram, even from a backup.
On every command that reads encrypted secrets, the key is
sourced in this order: in-process cache → UNREAD_PASSPHRASE env
var (handy for cron / CI) → getpass() prompt (TTY only). To skip
the prompt across invocations:
unread security unlock # cache the derived key until you `lock`
unread security unlock --keep 30m # … or for a bounded TTL
unread tg chats run # no prompt
unread security lock # wipe the cache now
unread security rotate-passphrase # change the passphrase
The cached key lives at $XDG_RUNTIME_DIR/unread/key on Linux
(tmpfs — auto-cleared on reboot) or ~/.unread/.runtime/key on
macOS / fallback. Mode is 0o600 from creation. The passphrase
itself is never persisted — only the derived key, only when you
explicitly unlock.
What encrypted mode does NOT defend against: malware running as
your user (same UID can read decrypted process memory regardless),
or a coerced-passphrase attack. For both, the mitigation is at the
OS level (FileVault, app sandboxing, hardware tokens), not at
unread's layer.
Telegram session hygiene
unread security revoke-session
Removes the local Telethon session file and prints a reminder to revoke remotely from Telegram → Settings → Devices → Active Sessions. Doing both is the only way to fully invalidate a leaked session.
Quick recommendations
- Personal Mac / Windows machine:
unread security set keystore. Zero friction, defends backup leakage, fits the realistic threat model. - VPS / shared host / paranoid laptop with no FDE:
unread security set pass, optionallyunread security unlock --keep 1hper shell. - Headless Linux / Docker / CI: stay on
plain, setUNREAD_PASSPHRASEonly if you've also enabledpassmode and need automation. - Anywhere: turn on FileVault / LUKS, exclude
~/.unread/fromtmutil/cloud sync, rununread doctorafter first setup.
Privacy: PII redaction before the LLM
--redact (or analyze.redact = true in config) scrubs PII from the
text sent to the LLM provider, while keeping originals in the local
DB and the saved Markdown report. Only the API payload is redacted.
unread @somegroup --redact
Patterns scrubbed: phone numbers (E.164 with + prefix), emails,
IBANs, and Luhn-valid credit-card numbers. Each match is replaced
with [redacted-phone] / [redacted-email] / [redacted-iban] /
[redacted-card], and the run summary shows per-kind counts so you
can see what was filtered. Caching honors the flag — toggling
--redact produces a different cache row, so you never serve a
non-redacted cached result on a redacted run (or vice versa).
The match is intentionally conservative (regex with strict word
boundaries) to keep false positives low. SHA hashes and order-id
numerics are not flagged; non-E.164 phone shapes (raw 10-digit US
numbers without +1 prefix) pass through. If you need stricter
redaction, layer your own preset prompt that asks the LLM to
generalize personal references — --redact complements that, it
doesn't replace it.
Command reference
unread --help shows four panels.
Main (everyday)
| Command | Purpose |
|---|---|
unread [<ref>] [flags] |
Analyze a chat (default action). No args → interactive wizard. |
unread tg [<ref>] [flags] / unread telegram [<ref>] [flags] |
Same as the bare form, but auto-runs init if no Telegram session exists. |
unread init [--force] |
Full interactive setup: install folder, AI provider + key, optional Telegram login. --force wipes the saved session before logging in. |
unread help [<cmd>] / unread --help |
Show top-level help (no args) or walk into a subcommand: unread help tg, unread help init. |
unread ask [<ref>] ["question"] [flags] |
Q&A over any ref (Telegram, YouTube, website, local file, stdin) — no Telegram round-trip for non-TG sources. No args opens a wizard. |
unread dump [<ref>] [flags] |
Dump history / extracted text to md/jsonl/csv. Accepts Telegram refs, URLs, local files, and stdin. No OpenAI call by default. |
Subcommand-name collisions.
unread <ref>will route to a subcommand if<ref>matches one (e.g.unread settingsopens the settings command, not a chat literally titled "settings"). Useunread tg "settings"orunread -- settingsfor the rare case of a chat that shares a subcommand name.
Telegram
| Command | Purpose |
|---|---|
unread tg describe [<ref>] |
List dialogs (no ref) or inspect one chat. Shows folder column. |
unread tg describe folders |
List your Telegram folders (use with --folder NAME). |
unread tg login [--force] |
Re-run the Telegram-only login step. --force wipes the saved session first. |
unread tg logout |
Remove the local Telegram session. |
unread tg chats add <ref> / unread tg chats manage |
Add a subscription, or open the interactive panel to list / enable / disable / remove. Optional — one-off analyze already fetches. |
unread tg sync |
Pull new messages for every active subscription. |
Maintenance
| Command | Purpose |
|---|---|
unread stats [--by …] |
Token spend / cache hit rate — by chat, preset, model, day, kind. |
unread cache <entity> [ls|purge|stats|show|export] |
Cache maintenance with three entity groups (<entity> ∈ ai | sources | tg). Bare cache <entity> = ls. See per-entity table below. |
unread doctor |
Preflight check — Telegram session, OpenAI key, ffmpeg, DB integrity, pricing, FDE, cloud-sync warnings. |
unread update [--check] [-y] |
Check PyPI for a newer release and (optionally) install it. |
unread security {status,set,unlock,lock,rotate-passphrase,revoke-session} |
Inspect / switch the credential-storage backend. See the Security section. |
unread backup up [out] |
Snapshot storage/data.sqlite via VACUUM INTO. |
unread backup restore <file> |
Replace data.sqlite with a backup (current DB moved aside). |
unread reports prune --older-than 30d |
Move stale report files to reports/.trash/. |
unread watch --interval 1h <inner cmd> |
Run an inner unread command on a fixed cadence. |
Hidden (still callable, not in --help)
unread download-media [<ref>] — kept for back-compat. Use unread dump --save-media instead.
Chat references
<ref> accepts:
| Form | Example |
|---|---|
@username |
@durov |
https://t.me/… |
https://t.me/durov/123 (jumps to message 123) |
| Forum-topic link | https://t.me/somegroup/100/5000 (topic 100, msg 5000) |
| Private link | https://t.me/c/1234567890/5000 |
| Invite link | https://t.me/+AbCdEf... (add --join to join it) |
Numeric chat_id |
-1001234567890 or 1001234567890 |
| YouTube URL | https://www.youtube.com/watch?v=... (see YouTube videos) |
| Website URL | https://example.com/article (see Web pages) |
| Local file path | ./report.pdf, ~/notes.md, /tmp/recording.mp3 (see Local files) |
- |
Read from stdin: cat notes.txt | unread or unread - < notes.txt |
Fuzzy chat-title match ("Bull Trading" → substring search across your dialogs)
is intentionally NOT a bare-unread <ref> form — bare unread "some text" would
silently try to authenticate with Telegram for every typo. Use unread tg "Bull Trading" instead, or pick from the wizard via unread with no args.
The wizard's chat picker accepts non-Latin type-to-filter (Cyrillic,
Greek, Arabic, Hebrew, Latin Extended) so searching for биохакинг or
finanças works the same as crypto.
YouTube videos
unread <youtube-url> analyzes a single video end-to-end. Flow:
- yt-dlp fetches metadata (title, channel, duration, captions index).
- A summary panel shows up + an interactive picker asks for the
transcript source — captions (free), audio + Whisper (paid, with a
cost estimate), or cancel. Skipped when stdin isn't a TTY, when
--yesis passed, or when an explicit--youtube-sourceflag was set. - Captions are fetched as VTT (or audio is downloaded → Whisper), and
each cue's start-second becomes that segment's
msg_id. - The bundled
videopreset runs over the time-stamped synthetic messages. Citations land as[#754](https://www.youtube.com/watch?v=ID&t=754s)— every citation in the report is a clickable jump to that moment. - Re-runs hit the
youtube_videoscache (metadata + transcript + timed cues) — no yt-dlp, no Whisper, no LLM-side re-spend if cached.
# Interactive default: shows metadata + asks for source.
unread "https://www.youtube.com/watch?v=jmzoJCn8evU"
# Scripted (skip prompts, auto-pick captions / Whisper as needed):
unread "https://youtu.be/dQw4w9WgXcQ" --yes
# Force Whisper (slower; ~$0.003/min):
unread "https://youtu.be/dQw4w9WgXcQ" --youtube-source audio
# Different preset; see `unread --help` for the full list.
unread "https://www.youtube.com/watch?v=..." --preset summary --console
Reports land under reports/youtube/<channel-slug>/<video-slug>-<preset>-<ts>.md.
Default preset for YouTube is video (system prompt tuned for transcripts,
time-stamped citations).
Telegram videos / video-circles (single-message mode) auto-flag as
source_kind="video" too: === Video: <title> === in the preamble and
the LLM is told it's analyzing a video transcript, not a chat snippet.
Supported URL shapes: youtube.com/watch?v=… (with arbitrary &list=,
start_radio, t= params, all stripped), youtu.be/, youtube.com/shorts/,
youtube.com/embed/, youtube.com/live/, m.youtube.com, music.youtube.com.
Playlist-only and channel-only links are rejected with a clean error
(playlist support is on the roadmap).
Telegram-only flags (--folder, --thread, --all-flat, --all-per-topic,
--with-comments, --from-msg, --full-history, --since/--until/--last-days/--last-hours,
--msg, --repeat-last, --mark-read) are rejected for YouTube refs with
a clear error.
unread doctor warns if yt-dlp isn't installed.
Web pages
unread <url> analyzes any HTTP/HTTPS web page (article, blog
post, documentation, essay) end-to-end. Auto-detected from the URL
shape: anything that isn't a YouTube link or a Telegram link
(t.me/...) routes here. No flag needed.
Flow:
- HTTP fetch —
httpxGET with a browser-shaped User-Agent and a 30-second timeout. 4xx/5xx, non-HTML responses, and oversize pages (>5 MB raw HTML, configurable) error out with a clean message. - Article extraction — primary extractor is
trafilatura(best-in-class article-body detection: drops nav / sidebar / footer / cookie banners, preserves headings + lists). Falls back to a BeautifulSoup pipeline (semantic-tag pass, then whole-bodyget_textif the page has only<div>/<span>). - Segmentation — extracted text is split into paragraph-shaped
chunks (≤3500 chars each, preferring blank-line boundaries). The
metadata block (title, site, author, publish date, word count, URL)
becomes synthetic message
#0; paragraphs are#1..#N. - Analysis — the bundled
websitepreset (system prompt tuned for single-author article body, not a chat conversation) runs over the synthetic messages. Citations land as[#7](https://example.com/article)— clicking a citation opens the page itself (paragraph anchors aren't exposed because most sites don't generate stable ones). - Cache — re-runs on the same URL hit the
website_pagescache (metadata + paragraphs + content hash). The analysis cache key includes the content hash, so re-running after a page edit produces a cache miss while an unchanged re-fetch reuses the previous run.
# Default — fetch, extract, run the website preset, save under reports/website/...
unread "https://www.paulgraham.com/greatwork.html"
# Estimate-and-exit (no LLM call):
unread "https://example.com/blog/post" --dry-run
# Render to terminal instead of saving a file:
unread "https://example.com/blog/post" --console
# Different preset — `summary`, `digest`, `highlights`, etc. all work:
unread "https://example.com/blog/post" --preset summary
# Cost-bounded run + post the analysis to your Saved Messages:
unread "https://example.com/blog/post" --max-cost 0.05 --post-saved
# Run a custom prompt against a page:
unread "https://example.com/paper.html" --preset custom --prompt-file my-prompt.md
Reports land under reports/website/<domain-slug>/<title-slug>-<preset>-<ts>.md.
Default preset is website (Russian translation in presets/ru/website.md).
URL normalization for cache keying strips fragments + common tracking
params (utm_*, fbclid, gclid, mc_cid, ref, …) so the same
article shared with different referrer tags hits the same cache row.
Telegram-only flags are rejected for website URLs with a clear error
(same list as YouTube, plus --cite-context since web pages have no
surrounding-context store to expand into).
Limitation: JS-rendered SPAs. unread fetches raw HTML only — no headless browser, no JS engine. Single-page apps (React / Angular / Vue / Svelte sites that paint content client-side) typically serve ~1–5 KB of bootstrapping markup with no readable text. Those URLs fail with a clear "appears to be a JavaScript-rendered single-page app" error, suggesting you try a static article URL or paste the content elsewhere. Most blogs, news sites, docs, and Markdown-rendered pages work fine — it's specifically the SPA case that doesn't.
Configuration knobs (under [website] in config.toml):
[website]
# fetch_timeout_sec = 30
# max_html_bytes = 5_000_000 # 5 MB hard cap on raw HTML
# max_paragraphs = 400 # post-split cap; rejects pathological pages
# user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/..."
Local files
unread <path> analyzes any local file. Auto-detected from the ref shape
— anything that starts with ./, ../, /, ~/, or file://, plus
bare filenames that match an existing file in the current directory,
routes to the file analyzer instead of being interpreted as a chat
title.
unread ./report.pdf # PDF (text-extracted via pypdf)
unread ./contract.docx # DOCX (python-docx)
unread ./meeting-notes.md # Markdown / plain text / source code
unread ./recording.mp3 # Audio (Whisper transcription)
unread ./standup.mp4 # Video (ffmpeg → audio → Whisper)
unread ./diagram.png # Image (vision description, then summary)
cat notes.txt | unread # Stdin (auto-detect when piped)
unread - < notes.txt # Stdin (explicit `-` form)
unread ./README.md --no-save # Console-only, don't write a report file
Supported types:
| Kind | Extensions | What happens |
|---|---|---|
| Text / code | .txt, .md, .csv, .json, .py, .js, .ts, .go, .rs, .html, .xml, .yaml, .toml, .sh, .sql, .tex, … |
Read as UTF-8 (with CP1251 / Latin-1 fallbacks) and analyzed as a document. |
.pdf |
pypdf extracts text page-by-page (200k-char cap). Scanned/image-only PDFs error out with an OCR hint. |
|
| DOCX | .docx |
python-docx extracts paragraphs. |
| Audio | .mp3, .m4a, .wav, .flac, .ogg, .opus |
Transcribed via Whisper (needs OPENAI_API_KEY). |
| Video | .mp4, .mov, .mkv, .webm |
ffmpeg pulls the audio track, Whisper transcribes (needs ffmpeg + OpenAI). |
| Image | .png, .jpg, .webp, .gif |
Vision model describes the image, then analyzed as text (needs OpenAI). |
| Stdin | - or piped |
Treated as plain text. |
Reports land at ~/.unread/reports/files/<kind>/<file-slug>-<preset>-<stamp>.md. Citations
in the report use file:// URIs ([#7](file:///abs/path)) so clicking
re-opens the source. Re-running on an unchanged file hits
local_files cache → no LLM cost.
Flags that only make sense for Telegram chats (--folder, --thread,
--all-flat, etc.) are rejected with a clear error when given a file
or stdin ref.
unread <ref> — flags
unread [<ref>] [period] [output] [enrichment] [budget] [audit] [delivery]
Period (start point of the analysis window)
| Flag | Meaning |
|---|---|
--full-history |
Whole chat |
--from-msg <id> / message link |
Start at a specific message, inclusive |
--since YYYY-MM-DD / --until YYYY-MM-DD / --last-days N / --last-hours N |
Date / hour range (UTC) |
| (none) | Unread only — msg_id > read_marker |
Precedence (first match wins): --full-history > --from-msg > --last-hours > --since/--until/--last-days > unread. (When both --last-hours and --last-days are passed, the hour-granular flag wins.)
Output
| Flag | Meaning |
|---|---|
-o <path> / --output |
Custom output path (file for single chat, dir for batch) |
-c / --console |
Render to terminal as Rich-styled markdown |
-s / --save |
Skip the wizard's output picker; save to default path |
Enrichment
| Flag | Meaning |
|---|---|
--enrich=voice,image,link |
Enable a specific subset for this run |
--enrich-all |
Every kind (voice, videonote, video, image, doc, link) |
--no-enrich |
Disable everything, even config defaults |
--include-transcripts/--text-only |
Include enrichment text in the analyzable body |
Forum routing
| Flag | Meaning |
|---|---|
--thread N |
One specific topic |
--all-flat |
Whole forum as one analysis (defaults to per-topic unread; honors --last-days / --since / --full-history) |
--all-per-topic |
One report per topic |
Cost / safety
| Flag | Meaning |
|---|---|
--max-cost N |
Estimate cost upfront; abort or confirm if over budget. Pass --yes to abort silently. |
--dry-run |
Resolve, backfill, count, print the cost band, exit before any LLM call. |
--no-cache |
Don't read or write analysis_cache (forces a fresh run). |
Audit / quality
| Flag | Meaning |
|---|---|
--cite-context N |
Append a ## Источники section to the report with N messages of context around every cited [#msg_id](url). Capped at 30 citations. |
--self-check |
Run a cheap-model verifier pass; appends a ## Verification section listing unsupported claims. |
--by <sender> |
Filter to one sender. Substring match on sender_name (case-insensitive) or numeric sender_id. |
Delivery
| Flag | Meaning |
|---|---|
--mark-read / --no-mark-read |
Tri-state. Without flag → prompt interactively. |
--post-saved |
Send the result to your Telegram Saved Messages (split into 4000-char chunks). |
--post-to <ref> |
Generalization — post to any chat (me for Saved Messages, @channel, etc.). |
Workflow shortcuts
| Flag | Meaning |
|---|---|
--folder NAME |
Without <ref>: batch-analyze every chat in this Telegram folder with unread messages. |
--repeat-last |
Reuse the saved flags from the last successful analyze on <ref>. Explicit CLI flags still win. |
--preset NAME / --prompt-file path.md |
Pick a preset; custom + --prompt-file for ad-hoc. |
--with-comments |
For a Telegram channel: also pull messages from its linked discussion group (comments) over the same period and run them through the same enrichment toggles. The report renders channel posts and comments as two sections with their own citation links. The wizard asks interactively when the picked chat is a channel with a linked group. Available on analyze, ask, and dump. |
--model M / --filter-model M |
Override per-run model picks. |
--min-msg-chars N |
Drop messages shorter than N chars (after enrichment). |
--yes / -y |
Skip interactive confirmations (per-topic Y/n, batch Y/n, over-budget Y/n). |
What you get back
A Markdown file at reports/{chat}[/{topic}]/analyze/{preset}-{stamp}.md
by default. Every cited claim is a clickable link back to the source:
Фонды переходят на индексные структуры с 2026 Q1. [#1586](https://t.me/c/3865481227/584/1586)
unread ask — Q&A over any source
unread ask "what did we decide about the migration?" @somegroup
unread ask https://youtu.be/dQw4w9WgXcQ "what is the main argument?"
unread ask ./notes.pdf "what are the action items?"
unread ask # opens the wizard
For Telegram refs, reads only your local DB — no Telegram
round-trip during retrieval. The corpus is everything analyze /
dump / sync has already pulled (transcripts, image descriptions,
doc extracts, link summaries included).
For non-Telegram refs (YouTube URLs, website URLs, local files,
stdin), ask extracts the source text directly — no Telegram
client is opened.
Synopsis: unread ask [<ref>] ["QUESTION"] [flags]. The positional
<ref> accepts any Telegram chat reference (@user, t.me link, topic URL,
fuzzy title, numeric id) or any non-Telegram ref (YouTube URL,
website URL, local file path, - / piped stdin). A topic URL like
https://t.me/c/1234567890/4 auto-fills --thread. Without
<ref> / --chat / --folder / --global the command opens the
wizard (chat picker → period → enrich → confirm → backfill → answer);
without a question, the wizard prompts for it inline. The four scope
sources are mutually exclusive — pick one per call.
Pipeline
- Tokenize the question — bilingual (English + Russian) stop-word filter, drops short tokens.
- Retrieve top-N messages by keyword
LIKEovertext || transcript. Default pool 500 with rerank, or 200 without. - (Optional) Rerank: cheap model rates each candidate 1–5 against the question; keep top-K (default 50). Drops ask cost ~5–10× on media-heavy chats.
- (Optional) Semantic:
text-embedding-3-smallcosine over a precomputed index; composes with rerank. - Format with the same dense-line formatter analyze uses, group by chat title for cross-chat answers.
- Ask the flagship model with a Q&A system prompt that mandates
[#msg_id](link)citations. - Print to terminal (default) or save to
-o file.md.
Flags
| Flag | Meaning |
|---|---|
<ref> (positional) |
Restrict to one chat — @user, t.me link, topic URL (auto-fills --thread), fuzzy title, numeric id. Mutually exclusive with --chat / --folder / --global. |
--chat <ref> |
Same as positional <ref> but explicit. |
--folder NAME |
Restrict to chats in this Telegram folder. |
--global / -g |
Search every synced chat in the local DB (no wizard, no Telegram calls). The pre-wizard default. |
--thread N |
Restrict to a forum topic (used with --chat / <ref>; topic URLs auto-fill this). |
--since/--until/--last-days |
Date filter. |
--limit N |
Max messages to retrieve (default 200; bumped to 500 when rerank is on). |
--rerank/--no-rerank |
Two-stage retrieval (default on; toggled in [ask] config). |
--semantic |
Use precomputed embeddings instead of keyword retrieval. Requires --build-index once. |
--build-index |
Embed every body-bearing message in the scoped chat(s). Idempotent. |
--refresh |
Backfill new messages from Telegram before retrieval. Requires --chat or --folder. |
--show-retrieved |
Print the retrieved messages with scores before the LLM call (debug). |
--no-followup |
Skip the post-answer "Continue chatting?" prompt (cron / scripts / non-interactive). |
--max-cost N |
Abort if the estimated USD cost exceeds N. |
--model M |
Override the answering model. |
--enrich=voice,image,link / --enrich-all / --no-enrich |
Run media enrichment (transcripts, image descriptions, link summaries, …) over the scoped chats + period BEFORE retrieval. Same flag shape as analyze. The wizard offers an enrich step too. |
-o <path> / --console |
Save to file / force terminal render. |
After every answer the CLI prompts Continue chatting? [y/N] (default
n). Press y to drop into multi-turn follow-ups (each new question
sees prior turns as message history); press Enter to exit. Pass
--no-followup to suppress the prompt entirely.
Examples
# No args — opens the wizard (asks for the question, then chat → period → confirm):
unread ask
# Positional ref — username:
unread ask "what did Bob say about migration?" @somegroup
# Positional ref — topic URL (thread auto-filled):
unread ask "open questions on the API" https://t.me/c/1234567890/4
# Across every synced chat (no wizard):
unread ask "когда дедлайн по проекту?" --global --last-days 7
# Folder scope, semantic retrieval (build index first):
unread ask "..." --folder Work --build-index
unread ask "open questions on the API" --folder Work --semantic --rerank --last-days 14
# Cheap and small:
unread ask "..." --limit 50 --model gpt-5.4-nano
# Debug retrieval before paying for the answer:
unread ask "..." @somegroup --show-retrieved --max-cost 0.05
# Single answer, no follow-up prompt (script-friendly):
unread ask "..." @somegroup --no-followup
Ask over non-Telegram sources
unread ask <ref> "QUESTION" accepts the same set of ref shapes as
unread <ref> and unread dump <ref>. The pre-Telegram dispatch picks
a source-specific extractor based on the ref shape, so a YouTube ask
never opens a Telegram client.
unread ask https://youtu.be/dQw4w9WgXcQ "what's the main argument?"
unread ask https://example.com/article "summarize the methodology"
unread ask ./notes.pdf "what are the action items?"
echo "raw text…" | unread ask "what is this about?"
For documents under the [ask].doc_full_text_cutoff_tokens threshold
(default 32k tokens), ask sends the full extracted text to the model
in one call. Above the cutoff, ask falls back to chunked retrieval —
the same machinery used for chat-archive queries. Tune the cutoff by
setting doc_full_text_cutoff_tokens under [ask] in
~/.unread/config.toml.
--chat, --folder, and --global are rejected when the ref is a
URL/file/stdin — a doc ref already names the source.
Cost feel
- Retrieval: free (local SQL).
- Rerank (default on): ~10 cheap-model calls × ~1k tokens each ≈ $0.005 per question.
- Answer: scales with
--limit. With rerank+keep=50 andgpt-5.4-mini, typical cost is ~$0.01–0.05 per question.
Cost is logged under phase=ask in usage_log — see unread stats --by kind.
unread dump — history and extracted text to a file
No OpenAI call by default. Accepts the same set of ref shapes as
unread <ref>: Telegram refs, YouTube URLs, website URLs, local files,
and stdin. For Telegram refs, the same backfill + filter pipeline as
analyze is used, just writing raw messages instead of an analysis.
unread dump @somegroup -o history.md --last-days 30
unread dump @somegroup --format jsonl --with-transcribe -o dump.jsonl
unread dump @somegroup --save-media # also save raw media files alongside
unread dump --folder Work # batch-dump every unread chat in folder
| Flag | Meaning |
|---|---|
--format md/jsonl/csv |
Output format (default md). |
--with-transcribe |
Run the audio enricher before writing (legacy alias for --enrich=voice,videonote). |
--enrich=... / --enrich-all / --no-enrich |
Same enrichment flags as analyze. |
--save-media [--save-media-types ...] |
Also save raw media files next to the dump. |
--folder NAME |
Without <ref>: batch-dump every unread chat in folder. |
All period / forum / output / --mark-read flags |
Same as analyze. |
dump for websites and YouTube
unread dump <url> works for any HTTP(S) page or YouTube link too. No
Telegram credentials needed — the URL is fetched and saved straight to
~/.unread/reports/. Pick the artifact via --mode (or let the
interactive picker choose on a TTY):
| URL kind | Mode | What you get |
|---|---|---|
| Website | text |
article.md — readable text only. |
| Website | full |
article.md + _files/img-N.<ext> — text plus every inlined image. Cap with --max-images N (default 50). |
| YouTube | transcript |
metadata.json + transcript.md (cue-tagged when captions are available). Honors --youtube-source auto|captions|audio. |
| YouTube | audio |
metadata.json + audio.mp3. Needs ffmpeg. |
| YouTube | video |
metadata.json + video.mp4 (or .mkv/.webm fallback). Needs ffmpeg. |
unread dump https://example.com/article --mode=text
unread dump https://example.com/article --mode=full --max-images 20
unread dump https://youtu.be/<id> --mode=transcript
unread dump https://youtu.be/<id> --mode=audio
unread dump https://youtu.be/<id> --mode=video
--mode is required in non-TTY runs (CI, piped stdin); the interactive
picker only fires on a real terminal. Telegram-only flags
(--folder, --since, --from-msg, --save-media, --mark-read,
…) are rejected with a clear error when used against a URL.
dump for local files and stdin
unread dump <path> saves a byte-for-byte copy of the file under
~/.unread/reports/files/<kind>/<original-name>-<stamp>.<original-ext>.
The original extension is preserved — a .ts file lands as .ts,
a PDF as .pdf, an audio file as .mp3. No extraction, no markdown
wrap, no LLM call:
unread dump ./src/data/content.ts # → ~/.unread/reports/files/text/content-<stamp>.ts
unread dump ./report.pdf # → ~/.unread/reports/files/pdf/report-<stamp>.pdf
unread dump ./meeting.mp3 # → ~/.unread/reports/files/audio/meeting-<stamp>.mp3
echo "raw text…" | unread dump # → ~/.unread/reports/files/stdin/<slug>-<stamp>.txt
Use unread <path> (analyze) or unread ask <path> if you want
extraction-then-markdown — the LLM-bound paths consume markdown, but
dump is the "save the original" verb and a re-encode would be lossy.
Wizard (no <ref>)
unread # → pick chat → thread (forum) → preset → period → enrich → run
unread ask # → pick chat → period → enrich → ask
unread dump # → pick chat → period → enrich → run
unread tg describe # → pick chat → show details / topics
Navigation: ↑/↓ move, type to filter (works for Cyrillic / Greek / Arabic / Hebrew / Latin Extended too), Enter select, SPACE toggles a checkbox (enrichment step), → in the enrichment step also toggles, ESC goes back a step, Ctrl-C quits.
Top of the chat picker:
- 🔍 Search all dialogs (not just unread) — first item; jumps into a fuzzy picker over every dialog.
- 🚀 Run on ALL N unread chats (M total messages) — second item; batch mode.
- Then the column-aligned chat list:
unread | kind | last msg | folder | title.
The enrichment step runs after the period step (so the
"(N in db)" decoration on each option reflects the period the user just
chose). The header line shows For the chosen period: N messages already synced, M with media, K with URLs. — instant feel for what
turning on --enrich=image will actually cost.
The 🚀 batch entry is offered when analyze is run without flags;
when picked, the wizard skips period/enrich and per-chat unread is the
fixed window.
Forum chats (topics)
Forums are chats with topics, each with its own unread marker. Three
modes for both analyze and dump:
unread @forumchat --thread 42 # one specific topic
unread @forumchat --all-flat --last-days 3 # whole forum, one report
unread @forumchat --all-per-topic # one report per topic
Without any of these, unread @forumchat opens a topic picker.
unread tg describe @forumchat prints the topic list with unread counts and
local-DB counts; both tg describe and the wizard fix Telegram's stale /
capped dialog-level forum counts by summing per-topic counts via
GetForumTopicsRequest.
Media enrichment
Telegram chats carry more than text. unread turns each non-text message
into something the LLM can read:
| Kind | What happens | Default |
|---|---|---|
| text | Used as-is | always on |
| voice (🎤) | Transcribed via OpenAI Audio (gpt-4o-mini-transcribe) |
on |
| videonote (round) | Audio extracted by ffmpeg → transcribed | on |
| external link | HTTP fetch + BeautifulSoup clean + 1–2 sentence summary via filter_model |
on |
| video | Audio extracted by ffmpeg → transcribed | off |
| photo | Described via vision model (gpt-4o-mini by default) — short caption + OCR of any on-image text |
off |
| doc (PDF / DOCX / txt / md / code) | Text extracted locally (pypdf / python-docx / plain read); truncated to max_doc_chars |
off |
Three ways to control:
unread @somegroup --enrich=voice,image,link # explicit set
unread @somegroup --enrich-all # everything
unread @somegroup --no-enrich # nothing, even defaults
Precedence (first wins): --no-enrich → --enrich-all →
--enrich=<csv> (unioned with the preset's enrich: frontmatter) →
preset's frontmatter alone → [enrich] block in config.toml.
Concurrency / per-doc lock. The orchestrator serializes per
document_id so a voice forwarded across multiple chats in one run
gets one Whisper call, not N.
Caching. Each enrichment result is stored once and reused: media
keyed by Telegram's stable document_id / photo_id
(media_enrichments); links keyed by normalized URL hash
(link_enrichments). Repeat runs over the same messages cost $0
once enrichments are cached.
Costs at a glance:
- Voice / videonote / video → Whisper (~$0.006/min).
- Image → one vision call per unique photo.
- Doc → free (local extraction).
- Link → one
filter_modelcall per unique URL (cheap,nano-class).
The orchestrator logs a one-line summary, plus per-call lines tagged
with phase=enrich_<kind> and the originating chat_id / msg_id /
msg_date so the cost in unread stats is traceable to actual messages.
Presets
What kind of analysis do you want? Pick a preset with --preset:
| Preset | What it produces |
|---|---|
summary (default) |
Concentrated signal — key insights, concrete ideas/decisions, 3–5 pointer messages. No recap prose. |
tldr |
Two or three sentences in one paragraph — phone-screen scan, no structure. |
digest |
5–10 most important themes, 1–2 lines each. |
highlights |
Top 5–15 most valuable messages, sorted by importance. |
quotes |
Verbatim memorable quotes with author and link. |
links |
External URLs grouped by topic (auto-enables link enrichment). |
action_items |
Markdown table: Who / What / Deadline / Status / Link. |
decisions |
Markdown table: Decision / Who / When / Rationale / Link. |
questions |
Open questions table: unanswered / partial / no consensus. |
reactions |
Top-reacted messages grouped by reaction kind (👍 / 🔥 / 🤔 / 👎). |
single_msg |
Picked automatically when <ref> is a t.me/.../<msg_id> link. |
multichat |
Picked automatically for batch / folder analysis: aggregates across chats into ONE report instead of per-chat. |
video |
Picked automatically for YouTube URLs — transcript summary with time-stamped citations. |
website |
Picked automatically for article / blog / docs URLs — TL;DR + key claims + key quotes. |
custom --prompt-file path.md |
Your own one-off prompt; same frontmatter format as the bundled ones. |
Prompts live in presets/<lang>/*.md — presets/en/ for
English, presets/ru/ for Russian. Each language directory is
autonomous: a language can have any subset of presets, and the loader
does NOT fall back across languages. Edit them, add your own, commit
them to your fork. Bump prompt_version after changing the body to
invalidate the cache. Optional frontmatter enrich: [link, image]
declares which media enrichments this preset assumes (unioned with
--enrich); description: is shown by the wizard's preset picker.
Reaction signals. Messages whose reaction count meets
[analyze] high_impact_reactions (default 3) are tagged
[high-impact] in the LLM prompt. Presets that care about prominence
(highlights, reactions, summary) lean on the marker; others
ignore it.
Language
Three independent settings cover the three roles a "language" can play:
| Setting | Drives | Default |
|---|---|---|
[locale] language |
UI language. Wizard, settings menu, banners, status messages, the ## Sources heading from --cite-context, the ## Verification heading from --self-check, the saved report's metadata block, the truncation banner. Drives i18n.t(). |
"en" |
[locale] report_language |
Report language. Picks which presets/<lang>/ tree the loader reads, the system prompt's base rules, the ask system prompt, the image / link enricher prompts, the formatter labels going into the LLM. The LLM writes the analysis in this language. |
"" → follow language |
[locale] content_language |
Source-content hint (Whisper-style). When set, the system prompt gets one extra line telling the LLM "the source content is in <X>". When empty, no hint is sent — the LLM auto-detects from the source text. Use only as an explicit override. |
"" → AI auto-detects |
The split exists because the three roles are genuinely different:
- An English speaker analyzing a Russian Telegram chat wants their
wizard / saved-report metadata in English (
language=en) but the LLM should produce Russian summaries (report_language=ru). No source hint needed — the chat is in Russian and the LLM detects that. - A Russian speaker reading a Chinese Wikipedia article wants Russian
reports (
language=ru,report_language=ru) but the LLM doesn't always nail "this article is in Chinese" purely from the prose (especially when the report headings are Russian). Setcontent_language=zhand the model trusts the hint.
# config.toml
[locale]
language = "en" # UI: wizard, banners, settings menu.
report_language = "ru" # LLM writes analyses in Russian.
content_language = "" # Empty = let the AI auto-detect the source.
# Set to "zh"/"ru"/… as a Whisper-style override.
Per-run overrides:
# English UI, Russian report; let the LLM detect what the chat is in.
unread @somechat --language en --report-language ru
# Russian UI + Russian report, but the source is Chinese (override the
# LLM's source-language detection).
unread https://zh.wikipedia.org/wiki/物理学 --report-language ru --content-language zh
# Ask follows the same shape.
unread ask "что обсуждали?" --language en --report-language ru
Whisper transcription has its own knob ([openai] audio_language) —
empty means autodetect; decoupled from all three locale axes.
Persisting preferences with unread settings
Edit your locale prefs without touching config.toml:
unread settings # interactive editor (three rows under Languages)
unread settings show # current effective values + DB overrides
unread settings set locale.language en
unread settings set locale.report_language ru
unread settings set locale.content_language zh # Whisper-style override; empty = auto-detect
unread settings unset locale.content_language # drop a single override
unread settings reset # drop all DB overrides
Saved to storage/data.sqlite in the app_settings table. Applied on
every unread invocation; explicit --language / --report-language /
--content-language flags still win.
Migration note
[locale] content_language was renamed to [locale] report_language in
v1.x — the old name now means a Whisper-style source-language hint
(empty = auto-detect). If you had content_language = "ru" set under
the old semantics (LLM writes in Russian), re-set it as
report_language = "ru" and clear content_language (or leave it empty
to let the AI auto-detect). The CLI flag --content-language was
similarly renamed to --report-language; the new --content-language
flag is the source hint.
Time window
By default analyze and dump process only messages past the chat's
read marker. To change that:
| Flag | Meaning |
|---|---|
--last-hours N |
Last N hours (UTC) — finer than --last-days |
--last-days N |
Last N days (UTC) |
--since YYYY-MM-DD --until YYYY-MM-DD |
Explicit date range (either end optional) |
--from-msg <id> / message link |
Start at a specific message, inclusive |
--full-history |
Entire chat |
Precedence: --full-history > --from-msg > --last-hours > --since/--until/--last-days > unread. When both --last-hours and --last-days are set, --last-hours wins.
YYYY-MM-DD strings are interpreted as UTC days (matches how
messages.date is stored).
Cost & caching
Three caches, aggressive by design:
1. Local analysis_cache
Every analysis result is hashed by preset + prompt_version + model +
sorted msg_ids + options_payload + system/user-prompt hashes and
stored in SQLite. Re-run the same query → zero-cost hit. Toggling
--enrich, --by, the model, or any other option-payload field busts
the relevant rows.
unread cache stats # rows, disk size, saved $, breakdown + prompt-cache hit rate
unread cache ls --limit 20 # latest entries
unread cache show <hash-prefix> # print a stored result
unread cache export -o old.jsonl --older-than 30d
unread cache purge --older-than 30d --vacuum
Truncated results are never cached. A partial summary would silently poison every future run.
2. OpenAI prompt cache (server-side)
When prompt prefix ≥ 1024 tokens and identical bytes arrive within
~5–10 minutes, OpenAI discounts repeated tokens.
unread cache stats shows your hit rate per (chat, preset) at the
bottom of its output.
config.toml enforces temperature=0.2 and a fixed
system → static_context → dynamic message order to maximize hits.
3. Enrichment dedup
Per-kind forever-cache:
- Media (voice / videonote / video / photo / doc) keyed by Telegram's
stable
document_idorphoto_idviamedia_enrichments. - External links keyed by normalized URL hash via
link_enrichments.
Forwarded 10× = fetched once.
Up-front cost guard
unread @somegroup --max-cost 0.50 # confirm if estimate exceeds
unread @somegroup --max-cost 0.50 --yes # silently abort if over
unread @somegroup --dry-run # estimate-and-exit, no LLM call
Estimate covers the analysis (map + reduce); enrichment cost is not included.
Spending visibility
unread stats # totals by preset
unread stats --by chat # biggest spenders by chat
unread stats --by day # spend over time
unread stats --by kind # chat vs audio vs ask
unread cache stats # OpenAI prompt-cache hit rate per (chat, preset) (bottom of output)
If a row says (N unpriced) next to its call count, those rows used a
model not in your [pricing.chat] / [pricing.audio] table — add the
entry so cost stops under-reporting. unread doctor flags missing
pricing entries.
Maintenance
# Version info (use this when filing bug reports)
unread --version # or `unread -V`
# Health check — Telegram session, OpenAI key, ffmpeg, DB integrity, presets, disk, pricing
unread doctor
# Diagnostic bundle for GitHub issues — version, doctor, redacted config + .env
unread bug-report # prints to stdout
unread bug-report --out report.txt # writes to a file
# Backup the data DB (VACUUM INTO — atomic, compact)
unread backup up # → storage/backups/data-YYYY-MM-DD_HHMMSS.sqlite
unread backup up mybackup.sqlite --overwrite
# Restore a backup (current DB moved aside as data-replaced-…sqlite)
unread backup restore storage/backups/data-2026-04-25_…sqlite --yes
# `unread cache` is split into three entity groups, each exposing the
# same five commands: bare `<entity>` = ls, plus purge / stats / show / export.
# Analysis cache (per-LLM-call result rows in `analysis_cache`)
unread cache ai # list newest first
unread cache ai stats # size, age range, by (preset, model), prompt-cache hit rate
unread cache ai show <hash-prefix> # print a stored result
unread cache ai purge --older-than 90d # delete by age / preset / model / --all
unread cache ai export -o ai.jsonl # dump rows to JSONL or markdown
# Source caches (extracted pages / YouTube transcripts / local-file text)
unread cache sources # ls all kinds
unread cache sources ls --kind website # one kind only
unread cache sources stats # rows + age per kind
unread cache sources show <page_id|video_id|file_id> # row metadata + paragraph preview
unread cache sources purge --domain zh.wikipedia.org # filter: --url / --domain / --kind / --older-than / --all
unread cache sources export -o sources.jsonl # JSONL inventory; pass --include-paragraphs for full text
# Telegram message cache (per-row `messages` table — synced chat history)
unread cache tg # per-chat counts (text / transcripts / age)
unread cache tg stats # totals + chats / messages / transcripts / age range
unread cache tg show 1234567890 # one chat's stats
unread cache tg purge --retention 90d # blank old message text (renamed from `unread cleanup`)
unread cache tg purge --retention 30d --chat 1234567890
unread cache tg export -o tg.jsonl --chat 1234567890 # dump cached rows; user-facing reports = `unread dump`
# Prune old report files to reports/.trash/<ts>/
unread reports prune --older-than 30d --dry-run # see what would move
unread reports prune --older-than 30d
unread reports prune --older-than 90d --purge # hard delete (asks first)
# Cache hygiene
unread cache purge --older-than 30d --vacuum
cleanup preserves row metadata (ids, dates, authors, transcripts) —
it only NULLs the raw text column.
unread watch — scheduled runs
Foreground loop that runs an inner unread command on a fixed cadence.
No daemon — run under tmux / nohup for persistence.
unread watch --interval 1h tg chats run
unread watch --interval 1h --folder Work --post-saved
unread watch --interval 30m ask --folder Work "anything urgent?"
unread watch --interval 6h --max-runs 4 https://example.com/blog
| Flag | Meaning |
|---|---|
--interval Ns/Nm/Nh/Nd/Nw |
Cadence (or bare seconds). Defaults to 1h. |
--max-runs N |
Stop after N runs (testing / fixed cycles). |
Bare unread watch (no inner command) prints this command's help.
Ctrl-C exits cleanly between iterations. The inner command's stdout
streams live; each iteration is preceded by ── Run K YYYY-MM-DDThh:mm:ss.
unread tg describe folders — Telegram folder integration
Telegram "folders" (dialog filters) become a first-class scope:
unread tg describe folders # list every folder + chat counts
unread --folder Work # batch every unread chat in folder
unread dump --folder Work # same for dump
unread ask "..." --folder Work # Q&A scoped to folder
Folder column shows up in:
unread tg describe(no ref) — the dialogs table.unread tg describe @chat— folder line under the username row.- The wizard's chat picker —
unread | kind | last msg | folder | title.
Only explicitly listed chats are expanded — rule-based folders ("contacts", "groups", "channels" without explicit peers) aren't walked.
Subscriptions (optional)
You don't need these for one-off analysis — unread @chat already
resolves the chat and fetches what's missing. Subscriptions are for
long-term tracking: a fixed set of chats you keep in your local DB,
sync on a cron, and analyze by date ranges across many runs.
unread tg chats add @somegroup
unread tg chats manage # interactive panel: list, enable, disable, remove
unread tg sync
unread tg chats add @forum --all-topics
unread tg chats add @channel --with-comments
Examples / recipes
# Daily morning digest of your work folder, into Saved Messages, on a 24h cron
unread watch --interval 24h analyze --folder Work --preset digest --post-saved
# Audit a high-stakes report — citations get expanded, claims verified
unread @somegroup --preset action_items --cite-context 5 --self-check
# What did Bob say last week? In one chat, with rerank + post-answer follow-ups (default)
unread ask "what did Bob propose?" @somegroup --last-days 7
# Filter analysis to one sender (with a citable result)
unread @somegroup --by Bob --preset highlights
# Cost-bounded run, with a budget alarm
unread @somegroup --enrich-all --max-cost 0.50 --post-to me
# Re-run with the same flags as last time, but force a fresh cache
unread @somegroup --repeat-last --no-cache
# Build a semantic index over a folder, then query it
unread ask --build-index --folder Work
unread ask "open architecture questions" --folder Work --semantic
# Forum: per-topic reports for the entire forum
unread @forumchat --all-per-topic
# Dump and save every photo / voice / video / doc alongside the text
unread dump @somegroup --save-media --save-media-types photo,voice
# Analyze a long-form article — paragraph-indexed citations link back to the page
unread "https://www.paulgraham.com/greatwork.html" --preset website
# Analyze a YouTube video, force Whisper instead of captions
unread "https://youtu.be/dQw4w9WgXcQ" --youtube-source audio --post-saved
Configuration (config.toml)
The shipped file uses the convention: only settings you override are uncommented; every knob is listed (commented-out, showing its default) so it's discoverable. Strict mode is on — typos fail loudly with a clear "extra inputs not permitted" error and the offending key.
Most-tuned settings:
[openai]
chat_model_default = "gpt-5.4-mini" # final / single-chunk model
filter_model_default = "gpt-5.4-nano" # map phase + cheap rerank + self-check
# audio_language = "" # Whisper hint; empty = autodetect
[locale]
# language = "en" # UI: "en" (default) / "ru" / …
# report_language = "" # LLM output; empty = follow `language`
# content_language = "" # Whisper-style source hint; empty = AI auto-detects
[analyze]
min_msg_chars = 3 # filter: drop messages shorter than N chars
dedupe_forwards = true # collapse identical forwards/memes
output_budget_tokens = 1500 # reduce / single-chunk max_tokens
high_impact_reactions = 3 # `[high-impact]` marker threshold
[enrich]
vision_model = "gpt-4o-mini"
[ask]
rerank_enabled = true # default: rerank candidates before the answer
rerank_top_k = 500 # candidate pool size before rerank
rerank_keep = 50 # what survives rerank → flagship
[pricing.chat.<model>] / [pricing.audio] populate the cost table.
Models that aren't priced still work — they just show as "unpriced
calls" in unread stats and unread doctor warns about it.
UNREAD_CONFIG_PATH=/abs/path/config.toml overrides the cwd-relative
discovery.
How it works
CLI (Typer)
├─ Resolver (Telethon) ──► SQLite: chats
├─ Backfill (incremental) ──► SQLite: messages
└─ Analyzer pipeline
├─ Filter + dedupe (+ optional --by sender filter)
├─ Enrich (per-kind, opt-in)
│ ├─ voice/videonote/video → OpenAI Audio ──► media_enrichments(kind=transcript)
│ ├─ photo → OpenAI Vision ──► media_enrichments(kind=image_description)
│ ├─ doc → pypdf / python-docx
│ └─ link → httpx + bs4 + LLM ──► link_enrichments
├─ Chunk (token-aware, soft-breaks on idle gaps)
├─ Map-reduce (OpenAI) ──► analysis_cache, analysis_runs, usage_log
├─ Optional --self-check (cheap-model verifier)
└─ Optional --cite-context (expand `[#msg_id](url)` to message blocks)
Three SQLite tables under storage/data.sqlite matter most:
messages— every message you've synced, plus media metadata + transcripts.media_enrichments/link_enrichments— per-kind enrichment caches keyed by stable IDs.analysis_cache— keyed analyses (zero-cost re-runs).
Plus newer:
chat_last_run_args— backs--repeat-last.message_embeddings— vector store forunread ask --semantic.usage_log— every OpenAI call, withphase=tag for cost attribution.
Reports land in reports/ (gitignored). Each cited claim is a
clickable link back to the source message. With --cite-context N the
report file additionally contains <details> fold blocks with N
messages around every citation, so the report is self-auditable
without re-opening Telegram.
Analyses larger than one context window are automatically map-reduced:
filter_model summarizes chunks in parallel, final_model merges. Each
map call is cached independently — adding one new message at the tail
re-costs only one chunk.
unread ask is its own pipeline: keyword retrieval (or embedding cosine)
→ optional rerank → format → single LLM call with citations. No
map-reduce; the candidate pool is bounded by --limit.
Before every network call, analyze and dump compare the local max
msg_id against Telegram's read marker — if nothing new exists, the
command exits without hitting the network.
Troubleshooting
When something breaks, always start with unread doctor — it
surfaces 90% of the common issues with a fix hint inline. If you're
filing a GitHub issue, run unread bug-report and paste the bundle:
it gathers version, platform, doctor output, and your config with
every secret masked.
| Symptom | Fix |
|---|---|
Telegram session expired / asks for code on every run |
unread init --force (re-runs Telethon auth without re-prompting for keys) |
yt-dlp DownloadError (private / region-locked / format change) |
uv tool upgrade unread — yt-dlp tracks YouTube changes; running an outdated wheel breaks first |
ffmpeg not found |
Install per the platform table above; unread doctor confirms detection |
OPENAI_API_KEY missing but you set it elsewhere |
The CLI reads ~/.unread/.env, not ~/.zshrc. Either edit ~/.unread/.env or run unread init to persist via the wizard |
attempt to write a readonly database |
chmod -R 700 ~/.unread/storage — the install dir lost write perms (sudo install, restored backup with wrong owner) |
storage permissions overpermissive (doctor warning) |
Run the chmod 700 … && chmod 600 … line printed by doctor — older installs predate the 0o700 hardening |
Cost reports look truncated / unread stats shows zeros |
unread cache stats to confirm the prompt cache is hitting (hit-rate table at the bottom); if not, verify [pricing] covers your model in ~/.unread/config.toml |
| Cache directory is huge | unread cache ai stats then unread cache ai purge --older-than 30d --vacuum; also unread cache sources purge for cached source text |
Migrating to a new install dir / moved ~/.unread/ |
Set UNREAD_HOME=/new/path, or copy .env / config.toml / storage/ / reports/ into the new dir manually |
Russian locale, English --help |
Currently English help only; full localization is on the roadmap |
| Want to start fresh | Delete ~/.unread/storage/data.sqlite and re-run unread init — credentials persist in data.sqlite::secrets, so this resets cache + analysis runs while keeping or refreshing keys |
For anything else: unread bug-report > report.txt and paste into a
new issue at https://github.com/maxbolgarin/unread/issues.
Development
uv sync --extra dev
uv run pytest -q # all tests (pytest-asyncio auto mode)
uv run ruff check . # lint
uv run ruff format --check . # format check (CI runs this)
Contributor guide — invariants, caching layers, preset format, schema,
and editing hazards — lives in CLAUDE.md. Read it
before changing the pipeline, DB layer, or preset prompts.
Design notes and roadmap live under docs/ and
ROADMAP.md.
Run unread doctor after any pull or env change — it surfaces the
common breakage points (missing ffmpeg, broken Telegram session,
missing pricing entries, schema drift).
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unread-0.1.1-py3-none-any.whl.
File metadata
- Download URL: unread-0.1.1-py3-none-any.whl
- Upload date:
- Size: 716.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
708cdf38a9c2afa6e955b559f50c614b10264ba8526214c20c32b1aaf8416491
|
|
| MD5 |
7a98b8d715f2f176a0123fc135d6fa06
|
|
| BLAKE2b-256 |
2d31f4453d2a75e8299230fdf4aee015c1baf364c737359a36f77b1bec3d6ec9
|