High-performance lzstring compression library compatible with JavaScript implementation
Project description
lean4url
A high-performance lzstring compression library fully compatible with JavaScript implementation.
Features
✅ Fully Compatible - 100% compatible with pieroxy/lz-string JavaScript implementation
✅ Unicode Support - Correctly handles all Unicode characters, including emoji and special symbols
✅ URL Friendly - Built-in URL encoding/decoding functionality
✅ High Performance - Optimized algorithm implementation
✅ Type Safe - Complete type annotation support
✅ Thoroughly Tested - Includes comparative tests with JavaScript version
Background
Existing Python lzstring packages have issues with Unicode character handling. For example, for the character "𝔓":
- Existing package output:
sirQ - JavaScript original output:
qwbmRdo= - lean4url output:
qwbmRdo=✅
lean4url solves this problem by correctly simulating JavaScript's UTF-16 encoding behavior.
Installation
pip install lean4url
Quick Start
Basic Compression/Decompression
from lean4url import LZString
# Create instance
lz = LZString()
# Compress string
original = "Hello, 世界! 🌍"
compressed = lz.compress_to_base64(original)
print(f"Compressed: {compressed}")
# Decompress string
decompressed = lz.decompress_from_base64(compressed)
print(f"Decompressed: {decompressed}")
# Output: Hello, 世界! 🌍
URL Encoding/Decoding
from lean4url import encode_url, decode_url
# Encode data to URL
data = "This is data to be encoded"
url = encode_url(data, base_url="https://example.com/share")
print(f"Encoded URL: {url}")
# Output: https://example.com/share/#codez=BIUwNmD2A0AEDukBOYAmBMYAZhAY...
# Decode data from URL
result = decode_url(url)
print(f"Decoded result: {result['codez']}")
# Output: This is data to be encoded
URL Encoding with Parameters
from lean4url import encode_url, decode_url
# Add extra parameters when encoding
code = "function hello() { return 'world'; }"
url = encode_url(
code,
base_url="https://playground.example.com",
lang="javascript",
theme="dark",
url="https://docs.example.com" # This parameter will be URL encoded
)
print(f"Complete URL: {url}")
# Output: https://playground.example.com/#codez=BIUwNmD2A0A...&lang=javascript&theme=dark&url=https%3A//docs.example.com
# Decode URL to get all parameters
params = decode_url(url)
print(f"Code: {params['codez']}")
print(f"Language: {params['lang']}")
print(f"Theme: {params['theme']}")
print(f"Documentation link: {params['url']}")
API Reference
LZString Class
class LZString:
def compress_to_base64(self, input_str: str) -> str:
"""Compress string to Base64 format"""
def decompress_from_base64(self, input_str: str) -> str:
"""Decompress string from Base64 format"""
def compress_to_utf16(self, input_str: str) -> str:
"""Compress string to UTF16 format"""
def decompress_from_utf16(self, input_str: str) -> str:
"""Decompress string from UTF16 format"""
URL Utility Functions
def encode_url(data: str, base_url: str = None, **kwargs) -> str:
"""
Encode input string and build complete URL.
Args:
data: Data to be encoded
base_url: URL prefix
**kwargs: Additional URL parameters
Returns:
Built complete URL
"""
def decode_url(url: str) -> dict:
"""
Decode original data from URL.
Args:
url: Complete URL
Returns:
Dictionary containing all parameters, with codez decoded
"""
Development
Environment Setup
# Clone repository
git clone https://github.com/rexwzh/lean4url.git
cd lean4url
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install development dependencies
pip install -e ".[dev]"
Running Tests
# Start JavaScript test service
cd tests/js_service
npm install
node server.js &
cd ../..
# Run Python tests
pytest
# Run tests with coverage
pytest --cov=lean4url --cov-report=html
Code Formatting
# Format code
black src tests
isort src tests
# Type checking
mypy src
# Code checking
flake8 src tests
Algorithm Principles
lean4url is based on a variant of the LZ78 compression algorithm, with core ideas:
- Dictionary Building - Dynamically build character sequence dictionary
- Sequence Matching - Find longest matching sequences
- UTF-16 Compatibility - Simulate JavaScript's UTF-16 surrogate pair behavior
- Base64 Encoding - Encode compression results in URL-safe format
Unicode Handling
The key difference from existing Python packages is in Unicode character handling:
- JavaScript: Uses UTF-16 surrogate pairs, "𝔓" →
[0xD835, 0xDCD3] - Existing Python packages: Use Unicode code points, "𝔓" →
[0x1D4D3] - lean4url: Simulates JavaScript behavior, ensuring compatibility
License
MIT License - See the LICENSE file for details.
Contributing
Issues and Pull Requests are welcome!
Changelog
v1.0.0
- Initial version release
- Complete lzstring algorithm implementation
- JavaScript compatibility
- URL encoding/decoding functionality
- Complete test suite
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lean4url-0.1.0.tar.gz.
File metadata
- Download URL: lean4url-0.1.0.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
564329adb16d93c05fa61fe4d8351402e39796b1781a5017a7040abf31d28b91
|
|
| MD5 |
9e9e52beafbd5702aa58835867ae2708
|
|
| BLAKE2b-256 |
763ce9ca6992172ff905e5db7995b488da3f91c04f108561de3691e459c6833c
|
File details
Details for the file lean4url-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lean4url-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e69830064bc9f7a6e350708bd0c21369b026c95e1ce1d31a509e005fb4383371
|
|
| MD5 |
b68c68482447315adb1d0e012327d027
|
|
| BLAKE2b-256 |
304702cb9d65068c69d9533408bfbf878fd50b6ec464664e57b3e317b58e2292
|