Skip to main content

A fast and efficient library for fixing contractions in text with reverse functionality and batch processing support

Project description

Contraction Fix

PyPI version Python Versions License: MIT

A fast and efficient Python library for fixing contractions in English text. Expand contractions like "can't" → "cannot" or contract expanded forms like "cannot" → "can't" with high performance and accuracy.

Features

  • Bidirectional processing: Expand contractions or contract expanded forms
  • High performance: Optimized with precompiled regex patterns and LRU caching
  • Batch processing: Efficiently process multiple texts at once
  • Smart detection: Distinguishes between contractions and possessives
  • Configurable dictionaries: Support for standard, informal, and internet slang
  • Thread-safe: Safe for concurrent usage
  • Extensible: Add or remove custom contractions
  • Preview mode: See what changes will be made before applying them

Installation

pip install contraction-fix

Quick Start

Expanding Contractions

from contraction_fix import fix

text = "I can't believe it's not butter!"
result = fix(text)
print(result)  # "I cannot believe it is not butter!"

Contracting Expanded Forms

from contraction_fix import contract

text = "I cannot believe it is not butter!"
result = contract(text)
print(result)  # "I can't believe it's not butter!"

Core Functionality

Single Text Processing

from contraction_fix import fix, contract

# Expand contractions
expanded = fix("I'd like to see y'all tomorrow")
print(expanded)  # "I would like to see you all tomorrow"

# Contract expanded forms
contracted = contract("I would like to see you all tomorrow")
print(contracted)  # "I'd like to see y'all tomorrow"

Batch Processing

For processing multiple texts efficiently:

from contraction_fix import fix_batch, contract_batch

texts = [
    "I can't believe it's working!",
    "They're going to the store",
    "We'll see what happens"
]

# Expand all texts
expanded = fix_batch(texts)
print(expanded)
# ["I cannot believe it is working!", "They are going to the store", "We will see what happens"]

# Contract all texts
contracted = contract_batch([
    "I cannot believe it is working!",
    "They are going to the store", 
    "We will see what happens"
])
print(contracted)
# ["I can't believe it's working!", "They're goin' to the store", "We'll see what happens"]

Smart Contraction Detection

The library intelligently distinguishes between contractions and possessive forms:

from contraction_fix import fix

text = "I can't find Sarah's keys, and she won't be at her brother's house until it's dark."
result = fix(text)
print(result)
# "I cannot find Sarah's keys, and she will not be at her brother's house until it is dark."

Notice how:

  • Contractions are expanded: "can't" → "cannot", "won't" → "will not", "it's" → "it is"
  • Possessives are preserved: "Sarah's" and "brother's" remain unchanged

Advanced Usage

Custom Configuration

from contraction_fix import ContractionFixer

# Create a custom fixer with specific settings
fixer = ContractionFixer(
    use_informal=True,   # Include informal contractions like "gonna", "goin'"
    use_slang=False,     # Exclude internet slang like "lol", "brb"
    cache_size=2048      # Increase cache size for better performance
)

text = "I'm gonna see y'all later, brb"
result = fixer.fix(text)
print(result)  # "I am going to see you all later, brb"
# Note: "brb" is preserved because use_slang=False

Preview Changes

See what changes will be made before applying them:

from contraction_fix import ContractionFixer

fixer = ContractionFixer()
text = "I can't believe it's working!"

matches = fixer.preview(text, context_size=5)
for match in matches:
    print(f"Found '{match.text}' at position {match.start}")
    print(f"Context: '{match.context}'")
    print(f"Will replace with: '{match.replacement}'")
    print()

Custom Contractions

Add or remove contractions dynamically:

from contraction_fix import ContractionFixer

fixer = ContractionFixer()

# Add custom contraction
fixer.add_contraction("lemme", "let me")
print(fixer.fix("lemme know"))  # "let me know"

# Remove existing contraction
fixer.remove_contraction("won't")
print(fixer.fix("I won't go"))  # "I won't go" (unchanged)

Dictionary Types

The library uses three configurable dictionary types:

Standard Contractions

Common English contractions like:

  • "can't" → "cannot"
  • "won't" → "will not"
  • "it's" → "it is"
  • "they're" → "they are"

Informal Contractions

Less formal patterns like:

  • "gonna" → "going to"
  • "goin'" → "going"
  • "doin'" → "doing"
  • "nothin'" → "nothing"

Internet Slang

Modern abbreviations like:

  • "btw" → "by the way"
  • "lol" → "laugh out loud"
  • "idk" → "I do not know"
  • "tbh" → "to be honest"

Performance

The library is highly optimized for speed and efficiency:

  • Precompiled regex patterns with intelligent grouping
  • LRU caching for repeated inputs (configurable cache size)
  • Efficient data structures using frozensets and slots
  • Batch processing optimization for multiple texts
  • Memory efficient with minimal allocations
  • Thread-safe operations with proper locking

Performance Best Practices

from contraction_fix import fix_batch, ContractionFixer

# ✅ Efficient: Use batch processing for multiple texts
texts = ["I can't go", "They're here", "We'll see"]
results = fix_batch(texts)

# ✅ Efficient: Reuse fixer instance
fixer = ContractionFixer()
results = [fixer.fix(text) for text in texts]

# ❌ Less efficient: Individual function calls
results = [fix(text) for text in texts]

Configuration Options

ContractionFixer Parameters

  • use_informal: bool = True

    • Include informal contractions like "gonna" → "going to"
    • Set to False for formal text processing
  • use_slang: bool = True

    • Include internet slang like "brb" → "be right back"
    • Set to False for academic or professional applications
  • cache_size: int = 1024

    • LRU cache size for memoization
    • Increase for better performance with repeated inputs
    • Decrease to reduce memory usage

Example Configurations

from contraction_fix import ContractionFixer

# Formal text processing
formal_fixer = ContractionFixer(use_informal=False, use_slang=False)

# High performance setup
fast_fixer = ContractionFixer(cache_size=4096)

# Memory conservative setup
light_fixer = ContractionFixer(cache_size=256)

API Reference

Package Functions

# Expansion functions
fix(text: str, use_informal: bool = True, use_slang: bool = True) -> str
fix_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]

# Contraction functions
contract(text: str, use_informal: bool = True, use_slang: bool = True) -> str
contract_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]

ContractionFixer Class

class ContractionFixer:
    def __init__(self, use_informal: bool = True, use_slang: bool = True, cache_size: int = 1024)
    
    # Core methods
    def fix(self, text: str) -> str
    def fix_batch(self, texts: List[str]) -> List[str]
    def contract(self, text: str) -> str
    def contract_batch(self, texts: List[str]) -> List[str]
    
    # Utility methods
    def preview(self, text: str, context_size: int = 10) -> List[Match]
    def add_contraction(self, contraction: str, expansion: str) -> None
    def remove_contraction(self, contraction: str) -> None

Match Class

@dataclass
class Match:
    text: str          # The matched contraction
    start: int         # Start position in original text
    end: int           # End position in original text
    replacement: str   # What it will be replaced with
    context: str       # Surrounding context

Examples

Text Preprocessing Pipeline

from contraction_fix import ContractionFixer

def preprocess_text(text: str) -> str:
    """Example preprocessing pipeline"""
    fixer = ContractionFixer(use_slang=False)  # Formal processing
    
    # Expand contractions for consistent analysis
    expanded = fixer.fix(text)
    
    # Your other preprocessing steps here
    # (tokenization, lowercasing, etc.)
    
    return expanded

# Usage
raw_text = "I can't believe it's working! They're awesome."
processed = preprocess_text(raw_text)
print(processed)  # "I cannot believe it is working! They are awesome."

Chat Message Processing

from contraction_fix import ContractionFixer

def normalize_chat_message(message: str) -> str:
    """Normalize casual chat messages"""
    fixer = ContractionFixer(use_informal=True, use_slang=True)
    
    # Expand everything for consistent processing
    return fixer.fix(message)

# Usage
chat_msg = "hey btw, i can't make it tonight lol"
normalized = normalize_chat_message(chat_msg)
print(normalized)  # "hey by the way, I cannot make it tonight laugh out loud"

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Make sure to:

  1. Add tests for new functionality
  2. Update documentation as needed
  3. Follow the existing code style
  4. Ensure all tests pass

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contraction_fix-0.2.2.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contraction_fix-0.2.2-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file contraction_fix-0.2.2.tar.gz.

File metadata

  • Download URL: contraction_fix-0.2.2.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for contraction_fix-0.2.2.tar.gz
Algorithm Hash digest
SHA256 cb6fe7d419c0e78ef957b00a9f6ea3288add80771c7035261552743064928ea7
MD5 52a9497676527dcebefa9a4dcfb38cd7
BLAKE2b-256 51e5c878b789ca5741b78fcbc577eca440598af96edab0c6999abac30b89a3be

See more details on using hashes here.

File details

Details for the file contraction_fix-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for contraction_fix-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d9d667595afa629668b241da156475a38c6ba63ace473bf9f8fa0a930c991d6a
MD5 40dace11d7690eb002fd31f86b0b2659
BLAKE2b-256 e0c52bf8a247e8f1d3970b8e7f67e63b638b01c7fd0d0f13bc7005e5271098d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page