Skip to main content

A custom text compression library using tokenization and Huffman encoding.

Project description

JAS - Custom Text Compression Library

coverage

JAS Compression is a custom text compression library that uses tokenization, specialized preprocessing for different text-based formats, and deterministic Huffman encoding to compress and decompress text files. The project is designed to handle plain text, JSON, CSV, XML, and YAML formats.

Features

  • Tokenization: Breaks text into tokens (words, punctuation, whitespace, etc.).
  • Special Phrase Detection: Identifies and replaces frequently occurring special phrases to improve compression.
  • Deterministic Huffman Encoding: Uses a deterministic Huffman tree for consistent encoding and decoding.
  • Format-Specific Preprocessing: Supports normalization for JSON, CSV, XML, and YAML files.
  • Command-Line Interface: Provides a CLI for compression and decompression with verbose logging and progress bars.

Installation

You can install the package via PyPI:

pip install jas-compression

Or, for the latest development version, clone the repository and install it locally:

git clone https://github.com/yourusername/jas-compression.git
cd jas
pip install .

Usage

Compression

To compress a text file:

python -m jas.cli compress input.txt output.jas --verbose

Decompression

To decompress a .jas file:

python -m jas.cli decompress output.jas result.txt --verbose

Project Structure

jas-compression/
├── jas/
│   ├── __init__.py
│   ├── compressor.py
│   ├── decompressor.py
│   ├── cli.py
│   ├── huffman.py
│   ├── tokenizer.py
│   ├── structured.py
│   ├── utils.py
│   └── bitstream.py
├── README.md
├── setup.py
├── MANIFEST.in
├── LICENSE
└── requirements.txt

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub. Make sure to follow the existing code style and include tests for any new features.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jas_compression-1.0.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jas_compression-1.0.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file jas_compression-1.0.0.tar.gz.

File metadata

  • Download URL: jas_compression-1.0.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for jas_compression-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1ebd4e1e30a895eeb88821e16581a10675959c2a14e5bc374944a87eb83ba2a4
MD5 f44f297a49740479ab2e569d81938a47
BLAKE2b-256 893a0be148e552cdec6fa274660c8c0f8e8001cd2239c6a6cd665110fe3401e2

See more details on using hashes here.

File details

Details for the file jas_compression-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jas_compression-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf9d1b965ebe80855798fd17462bb8c811ef877783836845635c7ed79d0d94be
MD5 182b82e9b6dabc74675b5d68df338d48
BLAKE2b-256 112ea9a0c293acec225764951792d6afcbd145dbe6f9f2264bba5ead33d87340

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page