A fast and efficient library for fixing contractions in text with reverse functionality and batch processing support
Project description
Contraction Fix
A fast and efficient Python library for fixing contractions in English text. Expand contractions like "can't" → "cannot" or contract expanded forms like "cannot" → "can't" with high performance and accuracy.
Features
- Bidirectional processing: Expand contractions or contract expanded forms
- High performance: Optimized with precompiled regex patterns and LRU caching
- Batch processing: Efficiently process multiple texts at once
- Smart detection: Distinguishes between contractions and possessives
- Configurable dictionaries: Support for standard, informal, and internet slang
- Thread-safe: Safe for concurrent usage
- Extensible: Add or remove custom contractions
- Preview mode: See what changes will be made before applying them
Installation
pip install contraction-fix
Quick Start
Expanding Contractions
from contraction_fix import fix
text = "I can't believe it's not butter!"
result = fix(text)
print(result) # "I cannot believe it is not butter!"
Contracting Expanded Forms
from contraction_fix import contract
text = "I cannot believe it is not butter!"
result = contract(text)
print(result) # "I can't believe it's not butter!"
Core Functionality
Single Text Processing
from contraction_fix import fix, contract
# Expand contractions
expanded = fix("I'd like to see y'all tomorrow")
print(expanded) # "I would like to see you all tomorrow"
# Contract expanded forms
contracted = contract("I would like to see you all tomorrow")
print(contracted) # "I'd like to see y'all tomorrow"
Batch Processing
For processing multiple texts efficiently:
from contraction_fix import fix_batch, contract_batch
texts = [
"I can't believe it's working!",
"They're going to the store",
"We'll see what happens"
]
# Expand all texts
expanded = fix_batch(texts)
print(expanded)
# ["I cannot believe it is working!", "They are going to the store", "We will see what happens"]
# Contract all texts
contracted = contract_batch([
"I cannot believe it is working!",
"They are going to the store",
"We will see what happens"
])
print(contracted)
# ["I can't believe it's working!", "They're goin' to the store", "We'll see what happens"]
Smart Contraction Detection
The library intelligently distinguishes between contractions and possessive forms:
from contraction_fix import fix
text = "I can't find Sarah's keys, and she won't be at her brother's house until it's dark."
result = fix(text)
print(result)
# "I cannot find Sarah's keys, and she will not be at her brother's house until it is dark."
Notice how:
- Contractions are expanded: "can't" → "cannot", "won't" → "will not", "it's" → "it is"
- Possessives are preserved: "Sarah's" and "brother's" remain unchanged
Advanced Usage
Custom Configuration
from contraction_fix import ContractionFixer
# Create a custom fixer with specific settings
fixer = ContractionFixer(
use_informal=True, # Include informal contractions like "gonna", "goin'"
use_slang=False, # Exclude internet slang like "lol", "brb"
cache_size=2048 # Increase cache size for better performance
)
text = "I'm gonna see y'all later, brb"
result = fixer.fix(text)
print(result) # "I am going to see you all later, brb"
# Note: "brb" is preserved because use_slang=False
Preview Changes
See what changes will be made before applying them:
from contraction_fix import ContractionFixer
fixer = ContractionFixer()
text = "I can't believe it's working!"
matches = fixer.preview(text, context_size=5)
for match in matches:
print(f"Found '{match.text}' at position {match.start}")
print(f"Context: '{match.context}'")
print(f"Will replace with: '{match.replacement}'")
print()
Custom Contractions
Add or remove contractions dynamically:
from contraction_fix import ContractionFixer
fixer = ContractionFixer()
# Add custom contraction
fixer.add_contraction("lemme", "let me")
print(fixer.fix("lemme know")) # "let me know"
# Remove existing contraction
fixer.remove_contraction("won't")
print(fixer.fix("I won't go")) # "I won't go" (unchanged)
Dictionary Types
The library uses three configurable dictionary types:
Standard Contractions
Common English contractions like:
- "can't" → "cannot"
- "won't" → "will not"
- "it's" → "it is"
- "they're" → "they are"
Informal Contractions
Less formal patterns like:
- "gonna" → "going to"
- "goin'" → "going"
- "doin'" → "doing"
- "nothin'" → "nothing"
Internet Slang
Modern abbreviations like:
- "btw" → "by the way"
- "lol" → "laugh out loud"
- "idk" → "I do not know"
- "tbh" → "to be honest"
Performance
The library is highly optimized for speed and efficiency:
- Precompiled regex patterns with intelligent grouping
- LRU caching for repeated inputs (configurable cache size)
- Efficient data structures using frozensets and slots
- Batch processing optimization for multiple texts
- Memory efficient with minimal allocations
- Thread-safe operations with proper locking
Performance Best Practices
from contraction_fix import fix_batch, ContractionFixer
# ✅ Efficient: Use batch processing for multiple texts
texts = ["I can't go", "They're here", "We'll see"]
results = fix_batch(texts)
# ✅ Efficient: Reuse fixer instance
fixer = ContractionFixer()
results = [fixer.fix(text) for text in texts]
# ❌ Less efficient: Individual function calls
results = [fix(text) for text in texts]
Configuration Options
ContractionFixer Parameters
-
use_informal: bool = True- Include informal contractions like "gonna" → "going to"
- Set to
Falsefor formal text processing
-
use_slang: bool = True- Include internet slang like "brb" → "be right back"
- Set to
Falsefor academic or professional applications
-
cache_size: int = 1024- LRU cache size for memoization
- Increase for better performance with repeated inputs
- Decrease to reduce memory usage
Example Configurations
from contraction_fix import ContractionFixer
# Formal text processing
formal_fixer = ContractionFixer(use_informal=False, use_slang=False)
# High performance setup
fast_fixer = ContractionFixer(cache_size=4096)
# Memory conservative setup
light_fixer = ContractionFixer(cache_size=256)
API Reference
Package Functions
# Expansion functions
fix(text: str, use_informal: bool = True, use_slang: bool = True) -> str
fix_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]
# Contraction functions
contract(text: str, use_informal: bool = True, use_slang: bool = True) -> str
contract_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]
ContractionFixer Class
class ContractionFixer:
def __init__(self, use_informal: bool = True, use_slang: bool = True, cache_size: int = 1024)
# Core methods
def fix(self, text: str) -> str
def fix_batch(self, texts: List[str]) -> List[str]
def contract(self, text: str) -> str
def contract_batch(self, texts: List[str]) -> List[str]
# Utility methods
def preview(self, text: str, context_size: int = 10) -> List[Match]
def add_contraction(self, contraction: str, expansion: str) -> None
def remove_contraction(self, contraction: str) -> None
Match Class
@dataclass
class Match:
text: str # The matched contraction
start: int # Start position in original text
end: int # End position in original text
replacement: str # What it will be replaced with
context: str # Surrounding context
Examples
Text Preprocessing Pipeline
from contraction_fix import ContractionFixer
def preprocess_text(text: str) -> str:
"""Example preprocessing pipeline"""
fixer = ContractionFixer(use_slang=False) # Formal processing
# Expand contractions for consistent analysis
expanded = fixer.fix(text)
# Your other preprocessing steps here
# (tokenization, lowercasing, etc.)
return expanded
# Usage
raw_text = "I can't believe it's working! They're awesome."
processed = preprocess_text(raw_text)
print(processed) # "I cannot believe it is working! They are awesome."
Chat Message Processing
from contraction_fix import ContractionFixer
def normalize_chat_message(message: str) -> str:
"""Normalize casual chat messages"""
fixer = ContractionFixer(use_informal=True, use_slang=True)
# Expand everything for consistent processing
return fixer.fix(message)
# Usage
chat_msg = "hey btw, i can't make it tonight lol"
normalized = normalize_chat_message(chat_msg)
print(normalized) # "hey by the way, I cannot make it tonight laugh out loud"
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. Make sure to:
- Add tests for new functionality
- Update documentation as needed
- Follow the existing code style
- Ensure all tests pass
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contraction_fix-0.2.2.tar.gz.
File metadata
- Download URL: contraction_fix-0.2.2.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb6fe7d419c0e78ef957b00a9f6ea3288add80771c7035261552743064928ea7
|
|
| MD5 |
52a9497676527dcebefa9a4dcfb38cd7
|
|
| BLAKE2b-256 |
51e5c878b789ca5741b78fcbc577eca440598af96edab0c6999abac30b89a3be
|
File details
Details for the file contraction_fix-0.2.2-py3-none-any.whl.
File metadata
- Download URL: contraction_fix-0.2.2-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9d667595afa629668b241da156475a38c6ba63ace473bf9f8fa0a930c991d6a
|
|
| MD5 |
40dace11d7690eb002fd31f86b0b2659
|
|
| BLAKE2b-256 |
e0c52bf8a247e8f1d3970b8e7f67e63b638b01c7fd0d0f13bc7005e5271098d5
|