Skip to main content

Enhanced fork of contractions library - Expands English contractions with improved performance and new features

Project description

sane-contractions

Tests Coverage Python 3.10+ License: MIT

A fast and comprehensive Python library for expanding English contractions and slang.

This is an enhanced fork of the original contractions library by Pascal van Kooten, with significant improvements in performance, testing, type safety, and maintainability.

Features

  • Fast: 50x faster than version 0.0.18 (uses efficient Aho-Corasick algorithm)
  • 📚 Comprehensive: Handles standard contractions, slang, and custom additions
  • 🎯 Smart: Preserves case and handles ambiguous contractions intelligently
  • 🔧 Flexible: Easy to add custom contractions on the fly
  • 🐍 Modern: Supports Python 3.10+

Installation

Using pip

pip install sane-contractions

Using uv (Recommended - Much Faster!)

uv pip install sane-contractions

uv is 10-100x faster than pip and is a drop-in replacement.

Quick Start

import contractions

contractions.fix("you're happy now")
# "you are happy now"

contractions.fix("I'm sure you'll love it!")
# "I am sure you will love it!"

Usage

Basic Contraction Expansion

import contractions

text = "I'm sure you're going to love what we've done"
expanded = contractions.fix(text)
print(expanded)
# "I am sure you are going to love what we have done"

Controlling Slang Expansion

contractions.fix("yall're gonna love this", slang=True)
# "you all are going to love this"

contractions.fix("yall're gonna love this", slang=False)
# "yall are going to love this"

contractions.fix("yall're gonna love this", leftovers=False)
# "yall are gonna love this"

Case Preservation

The library intelligently preserves the case pattern of the original contraction:

contractions.fix("you're happy")    # "you are happy"
contractions.fix("You're happy")    # "You are happy"
contractions.fix("YOU'RE HAPPY")    # "YOU ARE HAPPY"

Adding Custom Contractions

Add a single contraction:

contractions.add('myword', 'my word')
contractions.fix('myword is great')
# "my word is great"

Add multiple contractions at once:

custom_contractions = {
    "ain't": "are not",
    "gonna": "going to",
    "wanna": "want to",
    "customterm": "custom expansion"
}
contractions.add_dict(custom_contractions)

contractions.fix("ain't gonna happen")
# "are not going to happen"

Load contractions from a JSON file:

# custom_contractions.json contains: {"myterm": "my expansion", "another": "another word"}
contractions.load_json("custom_contractions.json")

contractions.fix("myterm is great")
# "my expansion is great"

Preview Contractions Before Fixing

The preview() function lets you see all contractions in a text before expanding them:

text = "I'd love to see what you're thinking"
preview = contractions.preview(text, flank=10)

for item in preview:
    print(f"Found '{item['match']}' at position {item['start']}")
    print(f"Context: {item['viewing_window']}")

# Output:
# Found 'I'd' at position 0
# Context: I'd love to
# Found 'you're' at position 21  
# Context: what you're thinkin

API Reference

fix(text, leftovers=True, slang=True)

Expands contractions in the given text.

Parameters:

  • text (str): The text to process
  • leftovers (bool): Whether to expand leftover contractions (default: True)
  • slang (bool): Whether to expand slang terms (default: True)

Returns: str - Text with contractions expanded

add(key, value)

Adds a single custom contraction.

Parameters:

  • key (str): The contraction to match
  • value (str): The expansion

add_dict(dictionary)

Adds multiple custom contractions at once.

Parameters:

  • dictionary (dict): Dictionary mapping contractions to their expansions

load_json(filepath)

Loads custom contractions from a JSON file.

Parameters:

  • filepath (str): Path to JSON file containing contraction mappings

Raises:

  • FileNotFoundError: If the file doesn't exist
  • json.JSONDecodeError: If the file contains invalid JSON

preview(text, flank)

Preview contractions in text before expanding.

Parameters:

  • text (str): The text to analyze
  • flank (int): Number of characters to show before/after each match

Returns: list[dict] - List of matches with context information

Examples

Standard Contractions

you're  -> you are
I'm     -> I am
we'll   -> we will
it's    -> it is
they've -> they have

Slang Terms

gonna   -> going to
wanna   -> want to
gotta   -> got to
yall    -> you all
ain't   -> are not

Month Abbreviations

jan. -> january
feb. -> february
mar. -> march

Ambiguous Cases

For ambiguous contractions, the library uses the most common expansion:

he's -> he is  (not "he has")

Performance

The library uses the Aho-Corasick algorithm for efficient string matching, achieving:

  • ~256K ops/sec for short texts
  • ~17K ops/sec for medium texts with no contractions
  • ~13K ops/sec for slang-heavy texts

Run performance benchmarks:

# Make sure package is installed in development mode
pip install -e .

python tests/test_performance.py

Requirements

  • Python 3.10 or higher
  • textsearch >= 0.0.21

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/kootenpv/contractions
cd contractions
pip install -e .
pip install pytest pytest-cov ruff mypy

Running Tests

# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=contractions --cov-report=term-missing

Code Quality

ruff check .
mypy contractions/__init__.py tests/

What's Different from the Original?

This fork includes several enhancements over the original contractions library:

🆕 New Features

  • add_dict() - Bulk add custom contractions from a dictionary
  • load_json() - Load contractions from JSON files
  • Type hints - Full type coverage with mypy validation
  • Better structure - Modular code organization (core, api modules)

🚀 Performance Improvements

  • Optimized dictionary operations using |= operator
  • Reduced function call overhead
  • Improved list comprehensions
  • Cached computations

🧪 Enhanced Testing

  • 100% test coverage (up from ~60%)
  • 16 comprehensive tests including edge cases
  • Error handling tests
  • Performance benchmarking suite

📦 Modern Tooling

  • Python 3.10+ support (modern type hints)
  • Ruff for fast linting
  • Pre-commit hooks
  • GitHub Actions CI/CD
  • Automated PyPI publishing

📚 Better Documentation

  • Comprehensive README with examples
  • API reference documentation
  • Deployment guide
  • Contributing guidelines

Why "sane-contractions"?

The original library is excellent but has been unmaintained since 2021. This fork provides:

  • Active maintenance
  • Modern Python practices
  • Community contributions
  • Regular updates

License

MIT License - see LICENSE file for details.

Credits

Original Author: Pascal van Kooten (@kootenpv)
Fork Maintainer: Jeremy Bruns
Original Repository: https://github.com/kootenpv/contractions

This project would not exist without Pascal's excellent foundation. All credit for the core concept and initial implementation goes to the original author.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sane_contractions-0.2.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sane_contractions-0.2.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file sane_contractions-0.2.0.tar.gz.

File metadata

  • Download URL: sane_contractions-0.2.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sane_contractions-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e6c28931b73cb91a343d2a2ea4220ef7f882740d919fdb110580a985f14df860
MD5 96bc8f5f11c4eea4f13f8ada44c4acaa
BLAKE2b-256 f5c33d4e79166dfd22b3dc899b8cc0869cb664a10ae07a9b1ca9c7b95d72e110

See more details on using hashes here.

Provenance

The following attestation bundles were made for sane_contractions-0.2.0.tar.gz:

Publisher: publish.yml on devjerry0/sane-contractions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sane_contractions-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sane_contractions-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82020bb66b86c41fee013666171c59131896440b9027bd1915b76f2173f9f0f1
MD5 da7bc7df493229e0c3e7c2821251fbe4
BLAKE2b-256 b1cf781a977022f47c86e1411a2adab1dd09552b0ed48ee790bdffe724fc9112

See more details on using hashes here.

Provenance

The following attestation bundles were made for sane_contractions-0.2.0-py3-none-any.whl:

Publisher: publish.yml on devjerry0/sane-contractions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page