Skip to main content

High-performance Python bindings for chatpack - parse chat exports from Telegram, WhatsApp, Instagram, and Discord

Project description

chatpack-py ๐Ÿš€

High-performance Python bindings for chatpack - parse chat exports from Telegram, WhatsApp, Instagram, and Discord with Rust-powered speed.

Features

  • โšก Blazing Fast: Rust implementation for maximum performance
  • ๐Ÿ”„ Multiple Platforms: Telegram, WhatsApp, Instagram, Discord
  • ๐Ÿ’พ Memory Efficient: Streaming API for large files
  • ๐Ÿ Pythonic API: Easy to use, well-documented
  • ๐ŸŽฏ Type Hints: Full IDE support with .pyi stubs
  • ๐Ÿ”ง Flexible: Filter, merge, and transform messages

Installation

pip install chatpack

Or build from source:

pip install maturin
maturin develop --release

Quick Start

Simple Parsing

import chatpack

# Parse Telegram export
messages = chatpack.parse_telegram("result.json", merge=True, min_length=5)

# Parse WhatsApp export
messages = chatpack.parse_whatsapp("chat.txt", merge=True)

# Parse Instagram export
messages = chatpack.parse_instagram("messages.json")

# Parse Discord export
messages = chatpack.parse_discord("export.json")

Object-Oriented API

# Create parser instance
parser = chatpack.TelegramParser()

# Parse with filters
messages = parser.parse(
    "result.json",
    merge=True,
    min_length=10,
    date_from="2024-01-01",
    date_to="2024-12-31"
)

# Access message properties
for msg in messages:
    print(f"{msg.sender}: {msg.content}")
    print(f"Timestamp: {msg.timestamp}")

Streaming Large Files

For files that don't fit in memory:

# Stream messages one by one
parser = chatpack.TelegramStreamParser("huge_export.json")

for msg in parser:
    process_message(msg)  # O(1) memory usage

Integration with Pandas

import chatpack
import pandas as pd

# Parse messages
messages = chatpack.parse_telegram("result.json", merge=True)

# Convert to DataFrame
df = pd.DataFrame([m.to_dict() for m in messages])

# Analyze
print(df.groupby('sender')['content'].count())

Filtering Messages

# Create filter configuration
config = chatpack.FilterConfig(
    min_length=10,
    max_length=1000,
    sender="Alice",
    date_from="2024-01-01",
    date_to="2024-12-31"
)

# Apply filters
filtered = chatpack.apply_filters(messages, config)

Merging Consecutive Messages

# Merge messages from same sender within 5 minutes
merged = chatpack.merge_consecutive(messages, time_threshold=300)

API Reference

Parsers

Eager Loading

  • parse_telegram(path, merge=False, min_length=None, date_from=None, date_to=None)
  • parse_whatsapp(path, merge=False, min_length=None, date_from=None, date_to=None)
  • parse_instagram(path, merge=False, min_length=None, date_from=None, date_to=None)
  • parse_discord(path, merge=False, min_length=None, date_from=None, date_to=None)

Streaming (for large files)

  • TelegramStreamParser(path) - Returns iterator
  • WhatsAppStreamParser(path) - Returns iterator
  • InstagramStreamParser(path) - Returns iterator
  • DiscordStreamParser(path) - Returns iterator

Classes

Message

msg = chatpack.Message(
    sender="Alice",
    content="Hello, world!",
    timestamp="2024-01-15T10:30:00Z",
    platform="telegram"
)

# Properties
msg.sender      # str
msg.content     # str
msg.timestamp   # Optional[str] (ISO 8601)
msg.platform    # Optional[str]

# Methods
msg.to_dict()   # Convert to dictionary

FilterConfig

config = chatpack.FilterConfig(
    min_length=5,
    max_length=1000,
    sender="Alice",
    date_from="2024-01-01",
    date_to="2024-12-31"
)

# Builder pattern
config.with_min_length(10)
config.with_sender("Bob")

OutputConfig

config = chatpack.OutputConfig(
    include_timestamps=True,
    include_platform=True
)

Utility Functions

  • merge_consecutive(messages, time_threshold=300) - Merge messages from same sender
  • apply_filters(messages, config) - Apply filter configuration

Platform Support

Platform Format Special Features
Telegram JSON Service messages, forwarded messages
WhatsApp TXT Auto-detects 4 locale date formats
Instagram JSON Fixes Mojibake encoding (Meta bug)
Discord JSON/CSV/TXT Attachments, stickers, replies

Performance

chatpack-py leverages Rust for parsing, making it significantly faster than pure Python implementations:

  • 10-100x faster than regex-based parsers
  • Memory efficient streaming for multi-GB files
  • Zero-copy where possible with PyO3

Development

Setup

# Clone repository
git clone https://github.com/berektassuly/chatpack-py
cd chatpack-py

# Install development dependencies
pip install maturin pytest

# Build in development mode
maturin develop

# Run tests
pytest

Project Structure

chatpack-py/
โ”œโ”€โ”€ Cargo.toml          # Rust dependencies
โ”œโ”€โ”€ pyproject.toml      # Python package metadata
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ lib.rs          # PyO3 module entry point
โ”‚   โ”œโ”€โ”€ types.rs        # Python type wrappers
โ”‚   โ”œโ”€โ”€ parsers.rs      # Parser implementations
โ”‚   โ”œโ”€โ”€ streaming.rs    # Streaming iterators
โ”‚   โ””โ”€โ”€ conversion.rs   # Rust โ†” Python conversion
โ”œโ”€โ”€ python/
โ”‚   โ””โ”€โ”€ chatpack/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ chatpack.pyi  # Type stubs
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ test_basic.py
    โ””โ”€โ”€ test_parsers.py

Building Wheels

# Build for current platform
maturin build --release

# Build for multiple platforms (requires Docker)
maturin build --release --manylinux 2014

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE for details.

Credits

Built on top of the excellent chatpack Rust library by Berektassuly.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatpack-0.1.0.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chatpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl (946.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file chatpack-0.1.0.tar.gz.

File metadata

  • Download URL: chatpack-0.1.0.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for chatpack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a818f47c36ce4bb690d43d789126066d18042f986eab81985d357a1ee46cb59d
MD5 57d284e5e729d1f42998f9a8d7991f44
BLAKE2b-256 31e36a894fc2f09036e0e423346aebd8de97a6e87dabd316ea9789c995c78d9b

See more details on using hashes here.

File details

Details for the file chatpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for chatpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e707a94592529fcee24d3403846c954534aecd1c11954a3c609c8324db2849bc
MD5 d3b728325b2378cd49ad22148aa646aa
BLAKE2b-256 6c265bed634fddc98970c34f6f078e2665f696ed51fefd6f6a1779f7cd5c878c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page