CanonMap - A Python library for entity canonicalization and mapping
Project description
CanonMap
A Python library for data mapping and canonicalization.
Installation
pip install canonmap
Quick Start
from canonmap import CanonMap
# Initialize the library
canon = CanonMap()
# Generate artifacts from a CSV file
artifacts = canon.generate_artifacts(
csv_path="path/to/your/data.csv",
entity_fields=["name", "email"],
use_other_fields_as_metadata=True
)
# Save artifacts to files
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path="output",
name="my_data"
)
print(f"Artifacts saved to: {zip_path}")
Detailed Example
Here's a complete example showing how to use the library in a real-world scenario:
from canonmap import CanonMap
import pandas as pd
from pathlib import Path
def process_customer_data(input_csv: str, output_dir: str):
# Initialize CanonMap
canon = CanonMap()
# Define the entity fields we want to extract
entity_fields = [
"customer_name",
"email",
"phone_number",
"company"
]
# Generate artifacts from the CSV
artifacts = canon.generate_artifacts(
csv_path=input_csv,
entity_fields=entity_fields,
use_other_fields_as_metadata=True, # Include other columns as metadata
num_rows=None # Process all rows
)
# Create output directory if it doesn't exist
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Save the artifacts
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path=str(output_path),
name="customer_data",
save_metadata=True,
save_schema=True
)
# You can also work with the artifacts directly
metadata = artifacts["metadata"]
schema = artifacts["schema"]
# Example: Print some statistics
print(f"Processed {metadata.get('row_count', 0)} rows")
print(f"Found {len(schema.get('entities', []))} entities")
return zip_path
# Usage
if __name__ == "__main__":
zip_file = process_customer_data(
input_csv="customers.csv",
output_dir="processed_data"
)
print(f"Processing complete. Results saved to: {zip_file}")
Features
- Process CSV files and generate metadata and schema
- Extract and canonicalize entity fields
- Map data to standardized formats
- Save artifacts as JSON files or ZIP archives
- Configurable processing options
Requirements
- Python 3.8+
- See setup.py for full list of dependencies
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
canonmap-0.1.6.tar.gz
(16.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
canonmap-0.1.6-py3-none-any.whl
(18.2 kB
view details)
File details
Details for the file canonmap-0.1.6.tar.gz.
File metadata
- Download URL: canonmap-0.1.6.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f256cc1b20bf801a693f76b0e9549efa137830616f3a01c4bb1c0aac0067b8a9
|
|
| MD5 |
d47663c1a47ab2b73dad1c232ad26870
|
|
| BLAKE2b-256 |
e564e386b1f892bb213e024bc1cbaca8b607f6616c2ff5b757da95cb3765c224
|
File details
Details for the file canonmap-0.1.6-py3-none-any.whl.
File metadata
- Download URL: canonmap-0.1.6-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d3b05e5934707e5c5a44fe870be746c04aefd5d9e442217660f91dcb56a38b7
|
|
| MD5 |
61e3d9c2a6bf6cd21c0c780b1a2f729c
|
|
| BLAKE2b-256 |
531797b4db92efad178bd508bbbcec8275d9310933c915142893baa88e76f17e
|