A custom text compression library using tokenization and Huffman encoding.
Project description
JAS - Custom Text Compression Library
JAS Compression is a custom text compression library that uses tokenization, specialized preprocessing for different text-based formats, and deterministic Huffman encoding to compress and decompress text files. The project is designed to handle plain text, JSON, CSV, XML, and YAML formats.
Features
- Tokenization: Breaks text into tokens (words, punctuation, whitespace, etc.).
- Special Phrase Detection: Identifies and replaces frequently occurring special phrases to improve compression.
- Deterministic Huffman Encoding: Uses a deterministic Huffman tree for consistent encoding and decoding.
- Format-Specific Preprocessing: Supports normalization for JSON, CSV, XML, and YAML files.
- Command-Line Interface: Provides a CLI for compression and decompression with verbose logging and progress bars.
Installation
You can install the package via PyPI:
pip install jas-compression
Or, for the latest development version, clone the repository and install it locally:
git clone https://github.com/yourusername/jas-compression.git
cd jas
pip install .
Usage
Compression
To compress a text file:
python -m jas.cli compress input.txt output.jas --verbose
Decompression
To decompress a .jas file:
python -m jas.cli decompress output.jas result.txt --verbose
Project Structure
jas-compression/
├── jas/
│ ├── __init__.py
│ ├── compressor.py
│ ├── decompressor.py
│ ├── cli.py
│ ├── huffman.py
│ ├── tokenizer.py
│ ├── structured.py
│ ├── utils.py
│ └── bitstream.py
├── README.md
├── setup.py
├── MANIFEST.in
├── LICENSE
└── requirements.txt
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub. Make sure to follow the existing code style and include tests for any new features.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jas_compression-1.0.0.tar.gz.
File metadata
- Download URL: jas_compression-1.0.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ebd4e1e30a895eeb88821e16581a10675959c2a14e5bc374944a87eb83ba2a4
|
|
| MD5 |
f44f297a49740479ab2e569d81938a47
|
|
| BLAKE2b-256 |
893a0be148e552cdec6fa274660c8c0f8e8001cd2239c6a6cd665110fe3401e2
|
File details
Details for the file jas_compression-1.0.0-py3-none-any.whl.
File metadata
- Download URL: jas_compression-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf9d1b965ebe80855798fd17462bb8c811ef877783836845635c7ed79d0d94be
|
|
| MD5 |
182b82e9b6dabc74675b5d68df338d48
|
|
| BLAKE2b-256 |
112ea9a0c293acec225764951792d6afcbd145dbe6f9f2264bba5ead33d87340
|