Enhanced fork of contractions library - Expands English contractions with improved performance and new features
Project description
sane-contractions
A fast and comprehensive Python library for expanding English contractions and slang.
This is an enhanced fork of the original contractions library by Pascal van Kooten, with significant improvements in performance, testing, type safety, and maintainability.
Features
- ⚡ Fast: 50x faster than version 0.0.18 (uses efficient Aho-Corasick algorithm)
- 📚 Comprehensive: Handles standard contractions, slang, and custom additions
- 🎯 Smart: Preserves case and handles ambiguous contractions intelligently
- 🔧 Flexible: Easy to add custom contractions on the fly
- 🐍 Modern: Supports Python 3.10+
Installation
Using pip
pip install sane-contractions
Using uv (Recommended - Much Faster!)
uv pip install sane-contractions
uv is 10-100x faster than pip and is a drop-in replacement.
Quick Start
import contractions
contractions.fix("you're happy now")
# "you are happy now"
contractions.fix("I'm sure you'll love it!")
# "I am sure you will love it!"
Usage
Basic Contraction Expansion
import contractions
text = "I'm sure you're going to love what we've done"
expanded = contractions.fix(text)
print(expanded)
# "I am sure you are going to love what we have done"
Controlling Slang Expansion
contractions.fix("yall're gonna love this", slang=True)
# "you all are going to love this"
contractions.fix("yall're gonna love this", slang=False)
# "yall are going to love this"
contractions.fix("yall're gonna love this", leftovers=False)
# "yall are gonna love this"
Case Preservation
The library intelligently preserves the case pattern of the original contraction:
contractions.fix("you're happy") # "you are happy"
contractions.fix("You're happy") # "You are happy"
contractions.fix("YOU'RE HAPPY") # "YOU ARE HAPPY"
Adding Custom Contractions
Add a single contraction:
contractions.add('myword', 'my word')
contractions.fix('myword is great')
# "my word is great"
Add multiple contractions at once:
custom_contractions = {
"ain't": "are not",
"gonna": "going to",
"wanna": "want to",
"customterm": "custom expansion"
}
contractions.add_dict(custom_contractions)
contractions.fix("ain't gonna happen")
# "are not going to happen"
Load contractions from a JSON file:
# custom_contractions.json contains: {"myterm": "my expansion", "another": "another word"}
contractions.load_json("custom_contractions.json")
contractions.fix("myterm is great")
# "my expansion is great"
Preview Contractions Before Fixing
The preview() function lets you see all contractions in a text before expanding them:
text = "I'd love to see what you're thinking"
preview = contractions.preview(text, flank=10)
for item in preview:
print(f"Found '{item['match']}' at position {item['start']}")
print(f"Context: {item['viewing_window']}")
# Output:
# Found 'I'd' at position 0
# Context: I'd love to
# Found 'you're' at position 21
# Context: what you're thinkin
API Reference
fix(text, leftovers=True, slang=True)
Expands contractions in the given text.
Parameters:
text(str): The text to processleftovers(bool): Whether to expand leftover contractions (default: True)slang(bool): Whether to expand slang terms (default: True)
Returns: str - Text with contractions expanded
add(key, value)
Adds a single custom contraction.
Parameters:
key(str): The contraction to matchvalue(str): The expansion
add_dict(dictionary)
Adds multiple custom contractions at once.
Parameters:
dictionary(dict): Dictionary mapping contractions to their expansions
load_json(filepath)
Loads custom contractions from a JSON file.
Parameters:
filepath(str): Path to JSON file containing contraction mappings
Raises:
FileNotFoundError: If the file doesn't existjson.JSONDecodeError: If the file contains invalid JSON
preview(text, flank)
Preview contractions in text before expanding.
Parameters:
text(str): The text to analyzeflank(int): Number of characters to show before/after each match
Returns: list[dict] - List of matches with context information
Examples
Standard Contractions
you're -> you are
I'm -> I am
we'll -> we will
it's -> it is
they've -> they have
Slang Terms
gonna -> going to
wanna -> want to
gotta -> got to
yall -> you all
ain't -> are not
Month Abbreviations
jan. -> january
feb. -> february
mar. -> march
Ambiguous Cases
For ambiguous contractions, the library uses the most common expansion:
he's -> he is (not "he has")
Performance
The library uses the Aho-Corasick algorithm for efficient string matching, achieving:
- ~256K ops/sec for short texts
- ~17K ops/sec for medium texts with no contractions
- ~13K ops/sec for slang-heavy texts
Run performance benchmarks:
# Make sure package is installed in development mode
pip install -e .
python tests/test_performance.py
Requirements
- Python 3.10 or higher
- textsearch >= 0.0.21
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
git clone https://github.com/kootenpv/contractions
cd contractions
pip install -e .
pip install pytest pytest-cov ruff mypy
Running Tests
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ --cov=contractions --cov-report=term-missing
Code Quality
ruff check .
mypy contractions/__init__.py tests/
What's Different from the Original?
This fork includes several enhancements over the original contractions library:
🆕 New Features
add_dict()- Bulk add custom contractions from a dictionaryload_json()- Load contractions from JSON files- Type hints - Full type coverage with mypy validation
- Better structure - Modular code organization (core, api modules)
🚀 Performance Improvements
- Optimized dictionary operations using
|=operator - Reduced function call overhead
- Improved list comprehensions
- Cached computations
🧪 Enhanced Testing
- 100% test coverage (up from ~60%)
- 16 comprehensive tests including edge cases
- Error handling tests
- Performance benchmarking suite
📦 Modern Tooling
- Python 3.10+ support (modern type hints)
- Ruff for fast linting
- Pre-commit hooks
- GitHub Actions CI/CD
- Automated PyPI publishing
📚 Better Documentation
- Comprehensive README with examples
- API reference documentation
- Deployment guide
- Contributing guidelines
Why "sane-contractions"?
The original library is excellent but has been unmaintained since 2021. This fork provides:
- Active maintenance
- Modern Python practices
- Community contributions
- Regular updates
License
MIT License - see LICENSE file for details.
Credits
Original Author: Pascal van Kooten (@kootenpv)
Fork Maintainer: Jeremy Bruns
Original Repository: https://github.com/kootenpv/contractions
This project would not exist without Pascal's excellent foundation. All credit for the core concept and initial implementation goes to the original author.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sane_contractions-0.2.0.tar.gz.
File metadata
- Download URL: sane_contractions-0.2.0.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6c28931b73cb91a343d2a2ea4220ef7f882740d919fdb110580a985f14df860
|
|
| MD5 |
96bc8f5f11c4eea4f13f8ada44c4acaa
|
|
| BLAKE2b-256 |
f5c33d4e79166dfd22b3dc899b8cc0869cb664a10ae07a9b1ca9c7b95d72e110
|
Provenance
The following attestation bundles were made for sane_contractions-0.2.0.tar.gz:
Publisher:
publish.yml on devjerry0/sane-contractions
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sane_contractions-0.2.0.tar.gz -
Subject digest:
e6c28931b73cb91a343d2a2ea4220ef7f882740d919fdb110580a985f14df860 - Sigstore transparency entry: 762073364
- Sigstore integration time:
-
Permalink:
devjerry0/sane-contractions@c89e04e68d39cb01696209ec676730dd8cb781f7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/devjerry0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c89e04e68d39cb01696209ec676730dd8cb781f7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sane_contractions-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sane_contractions-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82020bb66b86c41fee013666171c59131896440b9027bd1915b76f2173f9f0f1
|
|
| MD5 |
da7bc7df493229e0c3e7c2821251fbe4
|
|
| BLAKE2b-256 |
b1cf781a977022f47c86e1411a2adab1dd09552b0ed48ee790bdffe724fc9112
|
Provenance
The following attestation bundles were made for sane_contractions-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on devjerry0/sane-contractions
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sane_contractions-0.2.0-py3-none-any.whl -
Subject digest:
82020bb66b86c41fee013666171c59131896440b9027bd1915b76f2173f9f0f1 - Sigstore transparency entry: 762073369
- Sigstore integration time:
-
Permalink:
devjerry0/sane-contractions@c89e04e68d39cb01696209ec676730dd8cb781f7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/devjerry0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c89e04e68d39cb01696209ec676730dd8cb781f7 -
Trigger Event:
push
-
Statement type: