High-performance Python bindings for chatpack - parse chat exports from Telegram, WhatsApp, Instagram, and Discord
Project description
chatpack-py ๐
High-performance Python bindings for chatpack - parse chat exports from Telegram, WhatsApp, Instagram, and Discord with Rust-powered speed.
Features
- โก Blazing Fast: Rust implementation for maximum performance
- ๐ Multiple Platforms: Telegram, WhatsApp, Instagram, Discord
- ๐พ Memory Efficient: Streaming API for large files
- ๐ Pythonic API: Easy to use, well-documented
- ๐ฏ Type Hints: Full IDE support with
.pyistubs - ๐ง Flexible: Filter, merge, and transform messages
Installation
pip install chatpack
Or build from source:
pip install maturin
maturin develop --release
Quick Start
Simple Parsing
import chatpack
# Parse Telegram export
messages = chatpack.parse_telegram("result.json", merge=True, min_length=5)
# Parse WhatsApp export
messages = chatpack.parse_whatsapp("chat.txt", merge=True)
# Parse Instagram export
messages = chatpack.parse_instagram("messages.json")
# Parse Discord export
messages = chatpack.parse_discord("export.json")
Object-Oriented API
# Create parser instance
parser = chatpack.TelegramParser()
# Parse with filters
messages = parser.parse(
"result.json",
merge=True,
min_length=10,
date_from="2024-01-01",
date_to="2024-12-31"
)
# Access message properties
for msg in messages:
print(f"{msg.sender}: {msg.content}")
print(f"Timestamp: {msg.timestamp}")
Streaming Large Files
For files that don't fit in memory:
# Stream messages one by one
parser = chatpack.TelegramStreamParser("huge_export.json")
for msg in parser:
process_message(msg) # O(1) memory usage
Integration with Pandas
import chatpack
import pandas as pd
# Parse messages
messages = chatpack.parse_telegram("result.json", merge=True)
# Convert to DataFrame
df = pd.DataFrame([m.to_dict() for m in messages])
# Analyze
print(df.groupby('sender')['content'].count())
Filtering Messages
# Create filter configuration
config = chatpack.FilterConfig(
min_length=10,
max_length=1000,
sender="Alice",
date_from="2024-01-01",
date_to="2024-12-31"
)
# Apply filters
filtered = chatpack.apply_filters(messages, config)
Merging Consecutive Messages
# Merge messages from same sender within 5 minutes
merged = chatpack.merge_consecutive(messages, time_threshold=300)
API Reference
Parsers
Eager Loading
parse_telegram(path, merge=False, min_length=None, date_from=None, date_to=None)parse_whatsapp(path, merge=False, min_length=None, date_from=None, date_to=None)parse_instagram(path, merge=False, min_length=None, date_from=None, date_to=None)parse_discord(path, merge=False, min_length=None, date_from=None, date_to=None)
Streaming (for large files)
TelegramStreamParser(path)- Returns iteratorWhatsAppStreamParser(path)- Returns iteratorInstagramStreamParser(path)- Returns iteratorDiscordStreamParser(path)- Returns iterator
Classes
Message
msg = chatpack.Message(
sender="Alice",
content="Hello, world!",
timestamp="2024-01-15T10:30:00Z",
platform="telegram"
)
# Properties
msg.sender # str
msg.content # str
msg.timestamp # Optional[str] (ISO 8601)
msg.platform # Optional[str]
# Methods
msg.to_dict() # Convert to dictionary
FilterConfig
config = chatpack.FilterConfig(
min_length=5,
max_length=1000,
sender="Alice",
date_from="2024-01-01",
date_to="2024-12-31"
)
# Builder pattern
config.with_min_length(10)
config.with_sender("Bob")
OutputConfig
config = chatpack.OutputConfig(
include_timestamps=True,
include_platform=True
)
Utility Functions
merge_consecutive(messages, time_threshold=300)- Merge messages from same senderapply_filters(messages, config)- Apply filter configuration
Platform Support
| Platform | Format | Special Features |
|---|---|---|
| Telegram | JSON | Service messages, forwarded messages |
| TXT | Auto-detects 4 locale date formats | |
| JSON | Fixes Mojibake encoding (Meta bug) | |
| Discord | JSON/CSV/TXT | Attachments, stickers, replies |
Performance
chatpack-py leverages Rust for parsing, making it significantly faster than pure Python implementations:
- 10-100x faster than regex-based parsers
- Memory efficient streaming for multi-GB files
- Zero-copy where possible with PyO3
Development
Setup
# Clone repository
git clone https://github.com/berektassuly/chatpack-py
cd chatpack-py
# Install development dependencies
pip install maturin pytest
# Build in development mode
maturin develop
# Run tests
pytest
Project Structure
chatpack-py/
โโโ Cargo.toml # Rust dependencies
โโโ pyproject.toml # Python package metadata
โโโ src/
โ โโโ lib.rs # PyO3 module entry point
โ โโโ types.rs # Python type wrappers
โ โโโ parsers.rs # Parser implementations
โ โโโ streaming.rs # Streaming iterators
โ โโโ conversion.rs # Rust โ Python conversion
โโโ python/
โ โโโ chatpack/
โ โโโ __init__.py
โ โโโ chatpack.pyi # Type stubs
โโโ tests/
โโโ test_basic.py
โโโ test_parsers.py
Building Wheels
# Build for current platform
maturin build --release
# Build for multiple platforms (requires Docker)
maturin build --release --manylinux 2014
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
MIT License - see LICENSE for details.
Credits
Built on top of the excellent chatpack Rust library by Berektassuly.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chatpack-0.1.0.tar.gz.
File metadata
- Download URL: chatpack-0.1.0.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a818f47c36ce4bb690d43d789126066d18042f986eab81985d357a1ee46cb59d
|
|
| MD5 |
57d284e5e729d1f42998f9a8d7991f44
|
|
| BLAKE2b-256 |
31e36a894fc2f09036e0e423346aebd8de97a6e87dabd316ea9789c995c78d9b
|
File details
Details for the file chatpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: chatpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 946.6 kB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e707a94592529fcee24d3403846c954534aecd1c11954a3c609c8324db2849bc
|
|
| MD5 |
d3b728325b2378cd49ad22148aa646aa
|
|
| BLAKE2b-256 |
6c265bed634fddc98970c34f6f078e2665f696ed51fefd6f6a1779f7cd5c878c
|