A data matching and canonicalization library with multipl database connector support
Project description
CanonMap
A powerful data matching and canonicalization library with MySQL connector support.
Features
- Data Matching: Advanced algorithms for fuzzy string matching and record linkage
- MySQL Integration: Seamless connection and management of MySQL databases
- Canonicalization: Standardize and normalize data across different formats
- Rich Logging: Beautiful console output with structured logging
- FastAPI Support: Optional FastAPI integration for web services
Installation
pip install canonmap
For development dependencies:
pip install canonmap[dev]
For FastAPI support:
pip install canonmap[fastapi]
Quick Start
Command Line Interface
CanonMap provides a CLI tool for quick project setup:
# Create a new API project (default name: app)
cm create-api
# Create a new API project with custom name
cm create-api --name my-api
# Create a new API project with spaces (will be normalized)
cm create-api --name "My API"
The CLI will automatically:
- Normalize directory names to follow Python conventions
- Auto-increment names if the directory already exists (app, app-2, app-3, etc.)
- Copy and customize the example API template
- Replace all references from "app" to your chosen name
- Install required dependencies (fastapi, uvicorn, python-dotenv)
Basic Usage
from canonmap import make_console_handler
from canonmap.connectors.mysql_connector import MySQLConnector
# Set up logging
make_console_handler(set_root=True)
# Create a MySQL connector
connector = MySQLConnector(
host="localhost",
port=3306,
user="your_user",
password="your_password",
database="your_database"
)
# Use the connector for data operations
# ... your data matching and canonicalization code
Data Matching Example
from canonmap.connectors.mysql_connector.matching import Matcher
# Initialize matcher
matcher = Matcher()
# Perform fuzzy matching
matches = matcher.find_matches(
source_data=source_records,
target_data=target_records,
fields_to_match=["name", "address"],
threshold=0.8
)
Documentation
For detailed documentation, visit the project homepage.
Development
Setup
- Clone the repository:
git clone https://github.com/yourusername/canonmap.git
cd canonmap
- Install development dependencies:
pip install -e ".[dev]"
- Run tests:
pytest
Code Quality
This project uses several tools to maintain code quality:
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
- pytest: Testing
Run all quality checks:
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/
pytest
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a list of changes and version history.
Support
- Issues: GitHub Issues
- Documentation: Project README
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file canonmap-0.4.3.tar.gz.
File metadata
- Download URL: canonmap-0.4.3.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dad0b3a2082f32c80f831c9d300b7cc141703c05ff1cd10e31fc2cb3c076aef2
|
|
| MD5 |
99d3734bf718163fa19ac3956ffd1113
|
|
| BLAKE2b-256 |
b6f4e6bdc1f168e092f4ba8a7e93f9ce324ba0ba6e60a1537f75212b2173f4e8
|
File details
Details for the file canonmap-0.4.3-py3-none-any.whl.
File metadata
- Download URL: canonmap-0.4.3-py3-none-any.whl
- Upload date:
- Size: 57.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9db990784bfdfbb8e6f4a0952f8d501eace18aaaca3f5a5bb99e1279a62a1295
|
|
| MD5 |
492ae1a35c1bd66d99a8fb9c9fb13f46
|
|
| BLAKE2b-256 |
839dd062a10063fd58a062662be2c743bef9f3348742c402bfa3e954f77bc20d
|