CanonMap - A Python library for entity canonicalization and mapping
Project description
CanonMap
A Python library for data mapping and canonicalization.
Installation
pip install canonmap
Quick Start
from canonmap import CanonMap
# Initialize the library
canon = CanonMap()
# Generate artifacts from a CSV file
artifacts = canon.generate_artifacts(
csv_path="path/to/your/data.csv",
entity_fields=["name", "email"],
use_other_fields_as_metadata=True
)
# Save artifacts to files
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path="output",
name="my_data"
)
print(f"Artifacts saved to: {zip_path}")
Detailed Example
Here's a complete example showing how to use the library in a real-world scenario:
from canonmap import CanonMap
import pandas as pd
from pathlib import Path
def process_customer_data(input_csv: str, output_dir: str):
# Initialize CanonMap
canon = CanonMap()
# Define the entity fields we want to extract
entity_fields = [
"customer_name",
"email",
"phone_number",
"company"
]
# Generate artifacts from the CSV
artifacts = canon.generate_artifacts(
csv_path=input_csv,
entity_fields=entity_fields,
use_other_fields_as_metadata=True, # Include other columns as metadata
num_rows=None # Process all rows
)
# Create output directory if it doesn't exist
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Save the artifacts
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path=str(output_path),
name="customer_data",
save_metadata=True,
save_schema=True
)
# You can also work with the artifacts directly
metadata = artifacts["metadata"]
schema = artifacts["schema"]
# Example: Print some statistics
print(f"Processed {metadata.get('row_count', 0)} rows")
print(f"Found {len(schema.get('entities', []))} entities")
return zip_path
# Usage
if __name__ == "__main__":
zip_file = process_customer_data(
input_csv="customers.csv",
output_dir="processed_data"
)
print(f"Processing complete. Results saved to: {zip_file}")
Features
- Process CSV files and generate metadata and schema
- Extract and canonicalize entity fields
- Map data to standardized formats
- Save artifacts as JSON files or ZIP archives
- Configurable processing options
Requirements
- Python 3.8+
- See setup.py for full list of dependencies
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
canonmap-0.1.5.tar.gz
(16.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
canonmap-0.1.5-py3-none-any.whl
(18.1 kB
view details)
File details
Details for the file canonmap-0.1.5.tar.gz.
File metadata
- Download URL: canonmap-0.1.5.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46c102ddc40a22e6b4058e76cb494ab1b97a06bf15bde1b8f6e2f440638bb8f2
|
|
| MD5 |
86ea74db7764af3602c0eb3c3e285760
|
|
| BLAKE2b-256 |
6c52b343abd52b8aa378f5c323ebb5bcfffa11320bf172df9c2408af2583c7ea
|
File details
Details for the file canonmap-0.1.5-py3-none-any.whl.
File metadata
- Download URL: canonmap-0.1.5-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f554fd4589bedd83ce2cf04319d0e15bdae826e7bc709cb66c53dcfd48095e1
|
|
| MD5 |
d4d8c2b0d47eb9870f7108ca52b2011d
|
|
| BLAKE2b-256 |
84e317f706b717ce741dc33422d5de1084612e660be26f763ab480922ccb0ad8
|