Intelligent company name cleaning for Python
Project description
TidyName
Intelligent company name cleaning for Python.
TidyName is a Python package that intelligently removes legal entity terms and organization type indicators from company names while preserving cases where these terms are part of the actual business name.
Features
- Smart Detection: Identifies and removes corporate suffixes (LLC, Inc., Ltd., etc.)
- Intelligent Preservation: Preserves terms when they're part of brand names (e.g., "The Limited")
- Confidence Scoring: Provides confidence levels for each cleaning decision
- International Support: Handles international corporate suffixes (GmbH, S.A., etc.)
- Batch Processing: Clean multiple company names efficiently
- Configurable: Customize behavior through configuration options
- Pure Python: No external dependencies required
- Type Safety: Full type annotations for better IDE support
Requirements
- Python 3.13+
Installation
pip install tidyname
# For development
git clone https://github.com/your-repo/tidyname.git
cd tidyname
uv install
Quick Start
from tidyname import Cleaner
# Initialize the cleaner
cleaner = Cleaner()
# Clean a single company name
result = cleaner.clean("Apple Inc.")
print(result.original) # "Apple Inc."
print(result.cleaned) # "Apple"
print(result.confidence) # 0.95
print(result.confidence_level) # "high"
print(result.changes_made) # True
print(result.reason) # "Removed: Inc."
Advanced Usage
Batch Processing
from tidyname import Cleaner
cleaner = Cleaner()
companies = [
"Apple Inc.",
"Microsoft Corporation",
"Google LLC",
"The Limited", # Will be preserved
"Amazon" # No changes needed
]
results = cleaner.clean_batch(companies)
for result in results:
print(f"{result.original} → {result.cleaned}")
Configuration Options
from tidyname import Cleaner, CleanerConfig
# Custom configuration
config = CleanerConfig(
remove_corporate_suffixes=True, # Enable/disable suffix removal
preserve_known_brands=True, # Preserve known brand names
min_confidence_threshold=0.7 # Minimum confidence for changes
)
cleaner = Cleaner(config=config)
# Or configure after initialization
cleaner.configure(
preserve_known_brands=False,
min_confidence_threshold=0.8
)
Supported Terms
Corporate Suffixes
- Corporation: Company, Incorporated, Corporation, Corp., Corp, Inc., Inc
- Limited Liability: LLC, L.L.C., PLC, P.L.C.
- Limited: Limited, Ltd., Ltd, Co., Co
- Partnership: & Co., & Co, LLP, L.L.P.
- Professional: Professional Corporation, P.C., PC
International Suffixes
- German: GmbH, AG
- French: S.A., S.A
- Dutch: N.V., B.V.
- Italian: S.r.l., S.p.A.
Examples
Basic Cleaning
from tidyname import Cleaner
cleaner = Cleaner()
# Standard corporate suffixes
print(cleaner.clean("Apple Inc.").cleaned) # "Apple"
print(cleaner.clean("Microsoft Corporation").cleaned) # "Microsoft"
print(cleaner.clean("Google LLC").cleaned) # "Google"
# International suffixes
print(cleaner.clean("Siemens AG").cleaned) # "Siemens"
print(cleaner.clean("L'Oréal S.A.").cleaned) # "L'Oréal"
# Multiple suffixes
print(cleaner.clean("Tech Solutions Inc. LLC").cleaned) # "Tech Solutions"
Brand Preservation
from tidyname import Cleaner
cleaner = Cleaner()
# These will be preserved as they're known brands
result = cleaner.clean("The Limited")
print(result.cleaned) # "The Limited"
print(result.changes_made) # False
result = cleaner.clean("Limited Brands")
print(result.cleaned) # "Limited Brands"
print(result.changes_made) # False
Confidence and Reasoning
from tidyname import Cleaner
cleaner = Cleaner()
result = cleaner.clean("Apple Inc.")
print(f"Confidence: {result.confidence}") # 0.95
print(f"Level: {result.confidence_level}") # "high"
print(f"Reasoning: {result.reason}") # "Removed: Inc."
# Low confidence example
result = cleaner.clean("Limited Edition")
print(f"Confidence: {result.confidence}") # Lower score
print(f"Reasoning: {result.reason}") # Preservation reasoning
API Reference
Cleaner Class
__init__(config: CleanerConfig | None = None)
Initialize the cleaner with optional configuration.
clean(company_name: str) -> CleaningResult
Clean a single company name.
Parameters:
company_name: The company name to clean
Returns:
CleaningResultobject with cleaning results and metadata
clean_batch(company_names: list[str]) -> list[CleaningResult]
Clean multiple company names.
Parameters:
company_names: List of company names to clean
Returns:
- List of
CleaningResultobjects
configure(**kwargs) -> None
Update configuration settings.
CleaningResult
Result object containing:
original: Original company namecleaned: Cleaned company nameconfidence: Confidence score (0.0 to 1.0)confidence_level: "high", "medium", or "low"changes_made: Boolean indicating if changes were madereason: Human-readable explanation of the decision
CleanerConfig
Configuration object with:
remove_corporate_suffixes: Enable suffix removal (default: True)preserve_known_brands: Preserve known brand names (default: True)min_confidence_threshold: Minimum confidence for changes (default: 0.5)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite:
uv run pytest - Submit a pull request
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tidyname-0.1.0.tar.gz.
File metadata
- Download URL: tidyname-0.1.0.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4276f20ff828d49808d3d3afbe25831e1a455c2cb2773b8e0dfc0f3c99f4eee9
|
|
| MD5 |
70e5a31f6986fe3ed71ec046dd317402
|
|
| BLAKE2b-256 |
5170250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8
|
File details
Details for the file tidyname-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tidyname-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e9e2c7a1a6334ff6d358bd1dd5cdcc6df99fb9a457b3a2fae6e1b717cc9acec
|
|
| MD5 |
9f3701acb57a8aa3cc153c25092161b5
|
|
| BLAKE2b-256 |
19302855bb9fd53c89006bfcd4f77651aebb1377bb1bfc87e1f8871df60282e3
|