Parse Telegram Desktop JSON exports for LLM processing

These details have not been verified by PyPI

Project links

Project description

tg-parser

Parse Telegram Desktop JSON exports for LLM processing.

Transform messy chat exports into clean, structured data ready for summarization, analysis, and artifact extraction with Claude or other LLMs.

Features

Implemented ✅ (v1.2.0)

🗂️ All chat types: Personal, groups, supergroups, forum topics, channels
🔍 Powerful filtering: 9 filter types (date, sender, content, topic, attachments, reactions, etc.)
✂️ Smart chunking: 3 strategies (fixed, topic, hybrid) for LLM context limits
🚀 Streaming: ijson-based reader for files >50MB with auto-detection
📝 Multiple formats: Markdown (LLM-optimized), JSON, KB-template, CSV
🔌 MCP integration: 6 tools for Claude Desktop/Code
📊 Statistics: Message counts, top senders, topics breakdown, mention analysis
🎯 tiktoken integration: Accurate token counting (with SimpleTokenCounter fallback)
📄 split-topics command: Split forum chats by topic into separate files
✅ Type-safe: pyright strict mode, 413 comprehensive tests
🔧 mcp-config command: Auto-configure Claude Desktop/Code MCP integration
🆕 Config file support: TOML configuration with config command group

Installation

# From PyPI (recommended)
pip install tg-parser

# With uv
uv tool install tg-parser

# With all extras (MCP, tiktoken, streaming)
pip install "tg-parser[all]"

# From source
git clone https://github.com/mdemyanov/tg-parser.git
cd tg-parser
uv sync --all-extras

Quick Start

1. Export from Telegram Desktop

Open Telegram Desktop
Go to chat → ⋮ menu → Export chat history
Select JSON format, uncheck media if not needed
Export

2. Parse the export

# Basic parsing
tg-parser parse ./ChatExport/result.json -o ./output/

# Last 7 days only
tg-parser parse ./export.json --last-days 7

# Filter by sender
tg-parser parse ./export.json --senders "Иван Петров,Мария"

# Split forum by topics
tg-parser parse ./forum_export.json --split-topics

# Chunk for LLM context limits
tg-parser chunk ./export.json -s hybrid --max-tokens 8000

# Analyze mentions
tg-parser mentions ./export.json --format json

# Large files with streaming
tg-parser parse ./massive_export.json --streaming

# Get statistics
tg-parser stats ./export.json

3. Use with Claude

The output is optimized for LLM processing:

# Chat: Команда разработки
**Период:** 2025-01-13 — 2025-01-19  
**Участники:** Иван, Мария, Алексей

---

## 2025-01-15

### 10:30 — Иван Петров
Коллеги, нужно обсудить архитектуру нового модуля.

### 10:35 — Мария Сидорова
@Алексей, подготовь диаграмму к завтра.

CLI Reference

`tg-parser parse`

Main parsing command with filters.

tg-parser parse <input> [OPTIONS]

# Date filters
--date-from DATE        # Start date (YYYY-MM-DD)
--date-to DATE          # End date
--last-days N           # Last N days
--last-hours N          # Last N hours

# Sender filters
--senders TEXT          # Include senders (comma-separated)
--exclude-senders TEXT  # Exclude senders

# Topic filters (for forum groups)
--topics TEXT           # Include topics
--exclude-topics TEXT   # Exclude topics

# Content filters
--mentions TEXT         # Messages mentioning users
--contains REGEX        # Search pattern
--min-length N          # Minimum text length

# Type filters
--has-attachment        # Only with attachments
--has-reactions         # Only with reactions
--exclude-forwards      # Exclude forwarded
--include-service       # Include service messages

# Output
-o, --output PATH       # Output directory
-f, --format FORMAT     # markdown|json|csv
--split-topics          # Separate file per topic

`tg-parser chunk`

Split parsed output for LLM context limits.

tg-parser chunk <input> [OPTIONS]

-s, --strategy STRATEGY  # fixed|conversation|topic|daily
--max-tokens N           # Max tokens per chunk (default: 3000)
--time-gap N             # Minutes gap to split (default: 30)
--preserve-threads       # Don't break reply chains

`tg-parser stats`

Chat statistics overview.

tg-parser stats <input> [OPTIONS]

--format FORMAT          # table|json|markdown
--top-senders N          # Show top N senders
--by-topic               # Group by topic
--by-day                 # Daily breakdown

MCP Server

Use tg-parser directly in Claude Desktop or Claude Code.

Setup

# Auto-configure (recommended)
tg-parser mcp-config --apply

# Or manually add to claude_desktop_config.json:

{
  "mcpServers": {
    "tg-parser": {
      "command": "uvx",
      "args": ["tg-parser", "mcp"]
    }
  }
}

`tg-parser mcp-config`

Generate or apply MCP configuration for Claude Desktop/Code.

tg-parser mcp-config [OPTIONS]

# Print config to stdout (default)
tg-parser mcp-config

# Apply to Claude Desktop config
tg-parser mcp-config --apply

# Dry run - show what would be applied
tg-parser mcp-config --apply --dry-run

# Apply to Claude Code instead
tg-parser mcp-config --apply --target code

# Use 'uv run' instead of 'uvx'
tg-parser mcp-config --use-uv-run

Options:
  --apply               Apply config to Claude config file
  --dry-run             Show what would be written without applying
  --no-backup           Skip creating backup before modifying
  --target [desktop|code]  Target application (default: desktop)
  --use-uv-run          Use 'uv run' instead of 'uvx' for non-venv installs
  -v, --verbose         Verbose output

Available Tools

Tool	Description	Status
`parse_telegram_export`	Parse JSON export with filters	✅
`chunk_telegram_export`	Split messages for LLM context	✅
`get_chat_statistics`	Get chat statistics (JSON)	✅
`list_chat_participants`	List participants with message counts	✅
`list_chat_topics`	List forum topics with message counts	✅
`list_mentioned_users`	Analyze @mentions frequency	✅

Example Usage in Claude

User: Parse my team chat from last week and summarize key decisions

Claude: I'll parse the export and prepare it for analysis.
[Uses parse_telegram_export tool with date_from filter]

Based on the parsed chat, here are the key decisions...

Python API

from tg_parser import parse_chat, ChatFilter
from tg_parser.domain.value_objects import FilterSpecification, DateRange
from datetime import datetime, timedelta

# Simple parsing
chat = parse_chat("./export.json")
print(f"Loaded {len(chat.messages)} messages")

# With filters
filter_spec = FilterSpecification(
    date_range=DateRange(
        start=datetime.now() - timedelta(days=7)
    ),
    senders=frozenset(["Иван Петров"]),
    exclude_service=True,
)
chat = parse_chat("./export.json", filter_spec=filter_spec)

# Access data
for topic in chat.topics.values():
    msgs = chat.messages_by_topic(topic.id)
    print(f"{topic.title}: {len(msgs)} messages")

# Chunking
from tg_parser.application.services.chunker import ConversationChunker

chunker = ConversationChunker(max_tokens=3000)
chunks = chunker.chunk(chat.messages)

Output Formats

Markdown (default)

Clean, human-readable format optimized for LLM comprehension.

JSON

Structured format for programmatic processing:

{
  "meta": {
    "chat_name": "Team Chat",
    "chat_type": "supergroup_forum",
    "statistics": {
      "total_messages": 127,
      "tokens_estimate": 15000
    }
  },
  "messages": [
    {
      "id": 1234,
      "timestamp": "2025-01-15T10:30:00Z",
      "author": "Иван Петров",
      "text": "...",
      "topic": "architecture"
    }
  ]
}

CSV

Tabular format for spreadsheet analysis.

Chunking Strategies

Strategy	Description	Best For
`conversation`	Split by time gaps + size	General use (recommended)
`fixed`	Fixed token count	Simple cases
`topic`	One chunk per topic	Forum groups
`daily`	One chunk per day	Long time periods

Configuration

tg-parser supports TOML configuration files for setting default options.

Config File Locations (priority order)

--config PATH CLI flag
TG_PARSER_CONFIG environment variable
./tg-parser.toml (current directory)
./.tg-parser.toml (current directory, hidden)
~/tg-parser.toml (home directory)
~/.tg-parser.toml (home directory, hidden)
~/.config/tg-parser/config.toml (XDG standard)

Managing Config

# Create example config in current directory
tg-parser config init

# Create in specific location
tg-parser config init -o ~/.tg-parser.toml

# Show current effective config
tg-parser config show -v

# Show all search locations
tg-parser config path

# Use custom config for a command
tg-parser --config myconfig.toml parse export.json

Config File Format

Create ~/.config/tg-parser/config.toml:

[default]
output_format = "markdown"   # markdown, kb, json, csv
output_dir = "~/Documents/tg-exports"

[filtering]
exclude_service = true
exclude_empty = true
exclude_forwards = false
min_message_length = 0

[chunking]
strategy = "fixed"           # fixed, topic, hybrid
max_tokens = 8000

[output.markdown]
include_extraction_guide = false
no_frontmatter = false

[mentions]
min_count = 1
output_format = "table"      # table, json

[stats]
top_senders = 10

CLI arguments always override config file values.

Development

# Clone and setup
git clone https://github.com/example/tg-parser
cd tg-parser
uv sync --all-extras

# Run tests
uv run pytest

# Type check
uv run pyright

# Lint and format
uv run ruff check --fix
uv run ruff format

# Run CLI in dev mode
uv run tg-parser parse ./test.json

Architecture

Clean Architecture with clear separation:

presentation/  →  application/  →  domain/  ←  infrastructure/
   (CLI, MCP)     (use cases)    (entities)    (adapters)

Documentation

CLAUDE.md — AI assistant system prompt and development methodology
docs/ARCHITECTURE.md — Clean Architecture layers, domain model, design decisions
docs/DEVELOPMENT.md — Development guide, common tasks, testing guidelines
docs/TELEGRAM_FORMAT.md — Telegram JSON export format specification
PRD.md — Product requirements, roadmap, implementation status
CHANGELOG.md — Version history and release notes

Development Status

Current Version: 1.2.0 (Stable)

Component	Status	Details
Core parsing	✅ Complete	All chat types, topics, reactions
Filtering	✅ Complete	9 filter types
Chunking	✅ Complete	3 strategies (fixed, topic, hybrid)
Streaming	✅ Complete	ijson reader, auto-detection >50MB
CLI	✅ Complete	7 commands: `parse`, `stats`, `chunk`, `mentions`, `split-topics`, `mcp-config`, `config`
MCP Server	✅ Complete	6 tools for Claude integration
Writers	✅ Complete	Markdown, JSON, KB-template, CSV
Config	✅ Complete	TOML config files, `config` command group
Tests	✅ Complete	413 tests, pyright strict
PyPI	✅ Published	v1.2.0 available
CI/CD	✅ Automated	GitHub Actions for testing & releases

Roadmap

v1.0.0: ✅ RELEASED - Production stable, PyPI published, CI/CD automated
v1.1.0: ✅ RELEASED - CSV output, split-topics command, tiktoken integration
v1.2.0: ✅ RELEASED - TOML config file support, config command group

See PRD.md for detailed roadmap.

Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing)
Make changes with tests
Ensure uv run pytest and uv run pyright pass
Submit PR

License

MIT License - see LICENSE for details.

Acknowledgments

Telegram Desktop for export functionality
Typer for CLI framework
MCP for Claude integration

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Jan 20, 2026

1.0.0

Jan 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tg_parser-1.2.0.tar.gz (129.1 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tg_parser-1.2.0-py3-none-any.whl (89.7 kB view details)

Uploaded Jan 20, 2026 Python 3

File details

Details for the file tg_parser-1.2.0.tar.gz.

File metadata

Download URL: tg_parser-1.2.0.tar.gz
Upload date: Jan 20, 2026
Size: 129.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for tg_parser-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f84ed98e7bd5f2cb0419df81de3cd4a06013935b8dea3c4f36a4bd0ff7c8c3c3`
MD5	`8a95203dac4827eb7347f5d4fd0b9002`
BLAKE2b-256	`7af5b42088b0070f6cdfd6b9d2ea62fd7e5c9c71ec87fe317c12741ad76bb15e`

See more details on using hashes here.

File details

Details for the file tg_parser-1.2.0-py3-none-any.whl.

File metadata

Download URL: tg_parser-1.2.0-py3-none-any.whl
Upload date: Jan 20, 2026
Size: 89.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for tg_parser-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`399db27af8ca2364d4378515d24fee777a3212533947216c7b8c4c0ee980f100`
MD5	`20e6bc8a962cbc6655af0040db53ccb1`
BLAKE2b-256	`1671b16e81d425f14f2a1565b3cf14e004374cdf7aa84d6364c6b7b2e2224ec9`

See more details on using hashes here.

tg-parser 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tg-parser

Features

Implemented ✅ (v1.2.0)

Installation

Quick Start

1. Export from Telegram Desktop

2. Parse the export

3. Use with Claude

CLI Reference

tg-parser parse

tg-parser chunk

tg-parser stats

MCP Server

Setup

tg-parser mcp-config

Available Tools

Example Usage in Claude

Python API

Output Formats

Markdown (default)

JSON

CSV

Chunking Strategies

Configuration

Config File Locations (priority order)

Managing Config

Config File Format

Development

Architecture

Documentation

Development Status

Roadmap

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tg-parser parse`

`tg-parser chunk`

`tg-parser stats`

`tg-parser mcp-config`