CanonMap - A Python library for data mapping and canonicalization
Project description
CanonMap
A Python library for data mapping and canonicalization.
Installation
pip install canonmap
Quick Start
from canonmap import CanonMap
# Initialize the library
canon = CanonMap()
# Generate artifacts from a CSV file
artifacts = canon.generate_artifacts(
csv_path="path/to/your/data.csv",
entity_fields=["name", "email"],
use_other_fields_as_metadata=True
)
# Save artifacts to files
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path="output",
name="my_data"
)
print(f"Artifacts saved to: {zip_path}")
Detailed Example
Here's a complete example showing how to use the library in a real-world scenario:
from canonmap import CanonMap
import pandas as pd
from pathlib import Path
def process_customer_data(input_csv: str, output_dir: str):
# Initialize CanonMap
canon = CanonMap()
# Define the entity fields we want to extract
entity_fields = [
"customer_name",
"email",
"phone_number",
"company"
]
# Generate artifacts from the CSV
artifacts = canon.generate_artifacts(
csv_path=input_csv,
entity_fields=entity_fields,
use_other_fields_as_metadata=True, # Include other columns as metadata
num_rows=None # Process all rows
)
# Create output directory if it doesn't exist
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Save the artifacts
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path=str(output_path),
name="customer_data",
save_metadata=True,
save_schema=True
)
# You can also work with the artifacts directly
metadata = artifacts["metadata"]
schema = artifacts["schema"]
# Example: Print some statistics
print(f"Processed {metadata.get('row_count', 0)} rows")
print(f"Found {len(schema.get('entities', []))} entities")
return zip_path
# Usage
if __name__ == "__main__":
zip_file = process_customer_data(
input_csv="customers.csv",
output_dir="processed_data"
)
print(f"Processing complete. Results saved to: {zip_file}")
Features
- Process CSV files and generate metadata and schema
- Extract and canonicalize entity fields
- Map data to standardized formats
- Save artifacts as JSON files or ZIP archives
- Configurable processing options
Requirements
- Python 3.8+
- See setup.py for full list of dependencies
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
canonmap-0.1.1.tar.gz
(4.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file canonmap-0.1.1.tar.gz.
File metadata
- Download URL: canonmap-0.1.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c790d8cb4376ce22df75b6942bcb407e1d3bc5d080d068bba2cfa097466f4fdc
|
|
| MD5 |
0aa9d4b81d780f1db2bb297bc16e06e4
|
|
| BLAKE2b-256 |
f40412726215459a0b4d02d779e0328451a418505c8099a0e59393af23ee767e
|
File details
Details for the file canonmap-0.1.1-py3-none-any.whl.
File metadata
- Download URL: canonmap-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a33e5245a7b151a744f7aa0dbc62b2662ef8153d9f5001ec420fe453d7c224a
|
|
| MD5 |
8c56fde635329f8d949f33a4110efaf8
|
|
| BLAKE2b-256 |
177309ccda9c5a64795e6a6e9fef92dd7f4fe022b76b1e2f270f8dc18d7b9252
|