CanonMap - A Python library for data mapping and canonicalization
Project description
CanonMap
A Python library for data mapping and canonicalization.
Installation
pip install canonmap
Quick Start
from canonmap import CanonMap
# Initialize the library
canon = CanonMap()
# Generate artifacts from a CSV file
artifacts = canon.generate_artifacts(
csv_path="path/to/your/data.csv",
entity_fields=["name", "email"],
use_other_fields_as_metadata=True
)
# Save artifacts to files
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path="output",
name="my_data"
)
print(f"Artifacts saved to: {zip_path}")
Detailed Example
Here's a complete example showing how to use the library in a real-world scenario:
from canonmap import CanonMap
import pandas as pd
from pathlib import Path
def process_customer_data(input_csv: str, output_dir: str):
# Initialize CanonMap
canon = CanonMap()
# Define the entity fields we want to extract
entity_fields = [
"customer_name",
"email",
"phone_number",
"company"
]
# Generate artifacts from the CSV
artifacts = canon.generate_artifacts(
csv_path=input_csv,
entity_fields=entity_fields,
use_other_fields_as_metadata=True, # Include other columns as metadata
num_rows=None # Process all rows
)
# Create output directory if it doesn't exist
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Save the artifacts
zip_path = canon.save_artifacts(
artifacts=artifacts,
output_path=str(output_path),
name="customer_data",
save_metadata=True,
save_schema=True
)
# You can also work with the artifacts directly
metadata = artifacts["metadata"]
schema = artifacts["schema"]
# Example: Print some statistics
print(f"Processed {metadata.get('row_count', 0)} rows")
print(f"Found {len(schema.get('entities', []))} entities")
return zip_path
# Usage
if __name__ == "__main__":
zip_file = process_customer_data(
input_csv="customers.csv",
output_dir="processed_data"
)
print(f"Processing complete. Results saved to: {zip_file}")
Features
- Process CSV files and generate metadata and schema
- Extract and canonicalize entity fields
- Map data to standardized formats
- Save artifacts as JSON files or ZIP archives
- Configurable processing options
Requirements
- Python 3.8+
- See setup.py for full list of dependencies
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
canonmap-0.1.3.tar.gz
(16.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
canonmap-0.1.3-py3-none-any.whl
(18.3 kB
view details)
File details
Details for the file canonmap-0.1.3.tar.gz.
File metadata
- Download URL: canonmap-0.1.3.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51607ba4d9e89171d0f4184e9862c434eed33ed69a981f48f49931fc60c67a19
|
|
| MD5 |
511e783b80e238e05415ffc1dd13a7ac
|
|
| BLAKE2b-256 |
4ee13c14e0acb46fb79aa9b93197a02f3a460d4745404edad5d0bfd2513c2b39
|
File details
Details for the file canonmap-0.1.3-py3-none-any.whl.
File metadata
- Download URL: canonmap-0.1.3-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4729f52b9ca9eab39799a00e0fcbd68f75ada425f803abb635dddb13c76f92d
|
|
| MD5 |
e238e9a8ceabef5e35b5690cb680c172
|
|
| BLAKE2b-256 |
223bc90a76080c98477642056b6a58f3edcf18b617039db967899890371a0706
|