Skip to main content

Enhanced fork of contractions library - Expands English contractions with improved performance and new features

Project description

sane-contractions

Tests codecov Python 3.10+ License: MIT

A fast and comprehensive Python library for expanding English contractions.

Features

  • Fast: ~112K ops/sec for typical text expansion (Aho-Corasick algorithm)
  • 📚 Comprehensive: Handles standard contractions, slang, and custom additions
  • 🎯 Smart: Preserves case and handles ambiguous contractions intelligently
  • 🔧 Flexible: Easy to add custom contractions on the fly
  • 🐍 Modern: Supports Python 3.10+

Installation

Using pip

pip install sane-contractions

Using uv (Recommended - Much Faster!)

uv pip install sane-contractions

uv

Quick Start

import contractions

contractions.expand("you're happy now")
# "you are happy now"

contractions.expand("I'm sure you'll love it!")
# "I am sure you will love it!"

# Shorthand aliases
contractions.e("you're")  # "you are"
contractions.p("you're", 5)  # preview with context

Usage

Basic Contraction Expansion

import contractions

text = "I'm sure you're going to love what we've done"
expanded = contractions.expand(text)
print(expanded)
# "I am sure you are going to love what we have done"

Controlling Slang Expansion

contractions.expand("yall're gonna love this", slang=True)
# "you all are going to love this"

contractions.expand("yall're gonna love this", slang=False)
# "yall are going to love this"

contractions.expand("yall're gonna love this", leftovers=False)
# "yall are gonna love this"

Case Preservation

The library intelligently preserves the case pattern of the original contraction:

contractions.expand("you're happy")    # "you are happy"
contractions.expand("You're happy")    # "You are happy"
contractions.expand("YOU'RE HAPPY")    # "YOU ARE HAPPY"

Adding Custom Contractions

Add a single contraction:

contractions.add('myword', 'my word')
contractions.expand('myword is great')
# "my word is great"

Add multiple contractions at once:

custom_contractions = {
    "ain't": "are not",
    "gonna": "going to",
    "wanna": "want to",
    "customterm": "custom expansion"
}
contractions.add_dict(custom_contractions)

contractions.expand("ain't gonna happen")
# "are not going to happen"

Load contractions from a JSON file:

# custom_contractions.json contains: {"myterm": "my expansion", "another": "another word"}
contractions.load_file("custom_contractions.json")

contractions.expand("myterm is great")
# "my expansion is great"

Load all JSON files from a folder:

# Load all *.json files from a directory (ignores non-JSON files)
contractions.load_folder("./my_contractions/")

contractions.expand("myterm is great")
# "my expansion is great"

Preview Contractions Before Fixing

The preview() function lets you see all contractions in a text before expanding them:

text = "I'd love to see what you're thinking"
preview = contractions.preview(text, context_chars=10)

for item in preview:
    print(f"Found '{item['match']}' at position {item['start']}")
    print(f"Context: {item['viewing_window']}")

# Output:
# Found 'I'd' at position 0
# Context: I'd love to
# Found 'you're' at position 21  
# Context: what you're thinkin

API Reference

expand(text, leftovers=True, slang=True)

Expands contractions in the given text.

Parameters:

  • text (str): The text to process
  • leftovers (bool): Whether to expand leftover contractions (default: True)
  • slang (bool): Whether to expand slang terms (default: True)

Returns: str - Text with contractions expanded

add(key, value)

Adds a single custom contraction.

Parameters:

  • key (str): The contraction to match
  • value (str): The expansion

add_dict(dictionary)

Adds multiple custom contractions at once.

Parameters:

  • dictionary (dict): Dictionary mapping contractions to their expansions

load_file(filepath)

Loads custom contractions from a JSON file.

Parameters:

  • filepath (str): Path to JSON file containing contraction mappings

Raises:

  • FileNotFoundError: If the file doesn't exist
  • json.JSONDecodeError: If the file contains invalid JSON

load_folder(folderpath)

Loads custom contractions from all JSON files in a directory. Non-JSON files are automatically ignored.

Parameters:

  • folderpath (str): Path to directory containing JSON files

Raises:

  • FileNotFoundError: If the folder doesn't exist
  • NotADirectoryError: If the path is a file, not a directory
  • ValueError: If no JSON files are found in the folder

preview(text, context_chars)

Preview contractions in text before expanding.

Parameters:

  • text (str): The text to analyze
  • context_chars (int): Number of characters to show before/after each match

Returns: list[dict] - List of matches with context information

e(text, leftovers=True, slang=True)

Shorthand alias for expand().

p(text, context_chars)

Shorthand alias for preview().

Examples

Standard Contractions

you're  -> you are
I'm     -> I am
we'll   -> we will
it's    -> it is
they've -> they have

Slang Terms

gonna   -> going to
wanna   -> want to
gotta   -> got to
yall    -> you all
ain't   -> are not

Month Abbreviations

jan. -> january
feb. -> february
mar. -> march

Ambiguous Cases

For ambiguous contractions, the library uses the most common expansion:

he's -> he is  (not "he has")

Performance

The library uses the Aho-Corasick algorithm for efficient string matching, achieving:

  • ~112K ops/sec for typical text expansion (short texts with contractions)
  • ~251K ops/sec for preview operations (contraction detection)
  • ~17K ops/sec for medium texts with no contractions
  • ~13K ops/sec for slang-heavy texts
  • ~278K ops/sec for adding custom contractions

Benchmarked on Apple M3 Max, Python 3.13.

Run performance benchmarks yourself:

# Create virtual environment and install
uv venv && source .venv/bin/activate
uv pip install -e .

# Run benchmarks
python tests/test_performance.py

Requirements

  • Python 3.10 or higher
  • textsearch >= 0.0.21

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/devjerry0/sane-contractions
cd sane-contractions
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Running Tests

pytest tests/ --cov=contractions --cov-report=term-missing

Code Quality

ruff check .
mypy contractions/ tests/

What's Different from the Original?

This fork includes several enhancements over the original contractions library:

🆕 New Features

  • add_dict() - Bulk add custom contractions from a dictionary
  • load_file() - Load contractions from JSON files
  • load_folder() - Load all JSON files from a directory
  • Type hints - Full type coverage with mypy validation
  • Better structure - Modular code organization with single-responsibility modules
  • Facade API - Clean, simple public API with shorthand aliases (e(), p())

🚀 Performance Improvements

  • Lazy-loaded TextSearch instances (30x faster imports)
  • Optimized dictionary operations and comprehensions
  • Eliminated redundant code paths
  • Reduced function call overhead

🧪 Testing

  • 100% test coverage enforced via CI/CD
  • Comprehensive tests including edge cases
  • Input validation and error handling tests
  • Performance benchmarking suite

📦 Modern Tooling

  • Python 3.10+ support (modern type hints with list[dict], etc.)
  • Ruff for fast linting (replaces black, flake8, isort)
  • Mypy for strict type checking
  • GitHub Actions CI/CD with concurrency control
  • Automated PyPI publishing via Git tags
  • uv support for fast dependency management

📚 Better Documentation

  • Comprehensive README with real benchmark results
  • Complete API reference with examples
  • Clear contributing guidelines

Why "sane-contractions"?

This is an enhanced fork of the original contractions library by Pascal van Kooten, with improvements in performance, testing, type safety, and maintainability.

The original library is excellent but has been unmaintained since 2021. This fork provides:

  • Active maintenance
  • Modern Python practices
  • Community contributions
  • Regular updates

License

MIT License - see LICENSE file for details.

Credits

Original Author: Pascal van Kooten (@kootenpv)
Fork Maintainer: Jeremy Bruns
Original Repository: https://github.com/kootenpv/contractions

This project would not exist without Pascal's excellent foundation. All credit for the core concept and initial implementation goes to the original author.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sane_contractions-0.3.2.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sane_contractions-0.3.2-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file sane_contractions-0.3.2.tar.gz.

File metadata

  • Download URL: sane_contractions-0.3.2.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sane_contractions-0.3.2.tar.gz
Algorithm Hash digest
SHA256 5f7786ad3ec7103ab7b448703b6271023294d6c5fc3c3469ec34925083cb9a09
MD5 8efd1cd92a01fd51ab5c80dfbf2627ba
BLAKE2b-256 9b7f63a0fed2f4b02ba8245fffc9764599bc36a979001830bc4a989045bfb82f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sane_contractions-0.3.2.tar.gz:

Publisher: publish.yml on devjerry0/sane-contractions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sane_contractions-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sane_contractions-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ab3f43b92ceb351ee6403a8b72e18e2b69ab8924a5b4c4b7b1d33b17f1e8c1ea
MD5 53e2af786cdf12209149ce727a753c09
BLAKE2b-256 988d53039eea95ad8073d71f5f18c0f9b66423984e49e6785dda0c6bacb33d45

See more details on using hashes here.

Provenance

The following attestation bundles were made for sane_contractions-0.3.2-py3-none-any.whl:

Publisher: publish.yml on devjerry0/sane-contractions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page