CanonMap - A Python library for entity canonicalization and mapping

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Project description

CanonMap

A powerful Python library for intelligent entity matching and data canonicalization. CanonMap uses advanced techniques to identify, match, and standardize entities across your datasets.

Key Features

Multi-strategy Entity Matching: Combines multiple matching strategies for robust entity identification:
- Semantic matching (45%): Uses transformer embeddings for understanding meaning
- Fuzzy matching (35%): Handles typos and variations
- Initial matching (10%): Matches abbreviations and initials
- Keyword matching (5%): Matches individual words
- Phonetic matching (5%): Sound-based matching using Double Metaphone
Smart Scoring System: Sophisticated scoring with bonus points for:
- High semantic + fuzzy score combinations (+10 points)
- Perfect initial matches (+10 points)
- Perfect phonetic matches (+5 points)
- Penalties for mismatched high fuzzy/low semantic scores (-15 points)
Intelligent Entity Extraction:
- Automatic entity detection using spaCy NER
- Smart handling of name fields and patterns
- Configurable uniqueness ratios and length thresholds
- Support for both manual field selection and automatic extraction
Data Processing:
- CSV file processing with schema inference
- Metadata generation and management
- Entity normalization and standardization
- Support for custom field mapping

Installation

pip install canonmap

Dependencies

Python 3.8 or higher
spaCy and its English language model (automatically downloaded on first use)

Quick Start

from canonmap import CanonMap

# Initialize the library
canon = CanonMap()

# Generate artifacts from a CSV file
artifacts = canon.generate_artifacts(
    csv_path="path/to/your/data.csv",
    output_path="output",  # Optional: directory to save artifacts
    name="my_data",        # Base name for output files
    entity_fields=["name", "email"],  # Optional: specify entity fields
    use_other_fields_as_metadata=True,  # Include other columns as metadata
    num_rows=None,  # Optional: limit number of rows to process
    embed=True  # Whether to compute and save embeddings
)

# The artifacts dictionary contains:
# - metadata: List of entity objects with their metadata
# - schema: Nested dictionary of data types and formats
# - paths: Dictionary of paths to saved artifacts
# - embeddings: Optional numpy array of entity embeddings

# Match entities against your data
matches = canon.match_entity(
    entity_term="John Smith",
    metadata_path=artifacts["paths"]["metadata"],
    schema_path=artifacts["paths"]["schema"],
    embedding_path=artifacts["paths"]["embeddings"],  # Required for semantic search
    top_k=5,  # Maximum number of results to return
    threshold=80.0,  # Minimum score threshold (default: 0)
    field_filter=["name", "contact_name"],  # Optional: restrict matching to specific fields
    use_semantic_search=True,  # Enable semantic search (default: False)
    weights=None  # Optional: customize matching strategy weights
)

# Process results
for match in matches:
    print(f"Entity: {match['entity']}")
    print(f"Score: {match['score']}")
    print(f"Passes: {match['passes']}")  # Number of matching strategies that passed
    print(f"Metadata: {match['metadata']}")
    print("---")

Advanced Usage

Custom Matching Weights

# Customize the matching strategy weights
custom_weights = {
    'semantic': 0.50,  # Increase semantic matching importance
    'fuzzy': 0.30,     # Decrease fuzzy matching
    'initial': 0.10,   # Keep initial matching
    'keyword': 0.05,   # Keep keyword matching
    'phonetic': 0.05   # Keep phonetic matching
}

matches = canon.match_entity(
    entity_term="John Smith",
    metadata_path="metadata.pkl",
    schema_path="schema.pkl",
    embedding_path="embeddings.npz",  # Required for semantic search
    weights=custom_weights
)

Field-Specific Matching

# Restrict matching to specific fields
matches = canon.match_entity(
    entity_term="John Smith",
    metadata_path="metadata.pkl",
    schema_path="schema.pkl",
    embedding_path="embeddings.npz",
    field_filter=["customer_name", "contact_name"],
    use_semantic_search=True
)

Features in Detail

Entity Extraction

Automatic detection of entity fields
Support for custom entity field selection
Intelligent handling of name patterns
Configurable uniqueness thresholds
Length-based filtering
spaCy NER integration for complex text

Matching Process

Semantic pruning (if enabled)
Multi-strategy scoring
Weighted combination of scores
Bonus/penalty application
Result ranking and filtering

Data Processing

Schema inference
Data type detection
Date format recognition
Metadata generation
Entity normalization
Custom field mapping

Requirements

Python 3.8+
See setup.py for full list of dependencies

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

1.0.2

Jan 8, 2026

1.0.1

Jan 8, 2026

0.4.90

Aug 25, 2025

0.4.89

Aug 25, 2025

0.4.88

Aug 25, 2025

0.4.87

Aug 21, 2025

0.4.86

Aug 21, 2025

0.4.85

Aug 21, 2025

0.4.84

Aug 19, 2025

0.4.83

Aug 19, 2025

0.4.82

Aug 19, 2025

0.4.81

Aug 19, 2025

0.4.80

Aug 19, 2025

0.4.79

Aug 19, 2025

0.4.78

Aug 13, 2025

0.4.77

Aug 13, 2025

0.4.76

Aug 13, 2025

0.4.75

Aug 13, 2025

0.4.74

Aug 13, 2025

0.4.73

Aug 13, 2025

0.4.72

Aug 13, 2025

0.4.71

Aug 13, 2025

0.4.70

Aug 13, 2025

0.4.69

Aug 13, 2025

0.4.68

Aug 13, 2025

0.4.67

Aug 13, 2025

0.4.66

Aug 13, 2025

0.4.65

Aug 13, 2025

0.4.64

Aug 13, 2025

0.4.63

Aug 13, 2025

0.4.62

Aug 13, 2025

0.4.61

Aug 13, 2025

0.4.60

Aug 13, 2025

0.4.59

Aug 13, 2025

0.4.58

Aug 13, 2025

0.4.57

Aug 13, 2025

0.4.56

Aug 13, 2025

0.4.55

Aug 13, 2025

0.4.54

Aug 13, 2025

0.4.53

Aug 9, 2025

0.4.51

Aug 8, 2025

0.4.44

Aug 8, 2025

0.4.31

Aug 2, 2025

0.4.30

Aug 2, 2025

0.4.29

Aug 2, 2025

0.4.28

Aug 2, 2025

0.4.27

Aug 2, 2025

0.4.25

Aug 2, 2025

0.4.24

Aug 1, 2025

0.4.23

Aug 1, 2025

0.4.22

Aug 1, 2025

0.4.21

Aug 1, 2025

0.4.20

Aug 1, 2025

0.4.19

Aug 1, 2025

0.4.18

Aug 1, 2025

0.4.17

Aug 1, 2025

0.4.16

Aug 1, 2025

0.4.15

Aug 1, 2025

0.4.14

Aug 1, 2025

0.4.13

Aug 1, 2025

0.4.12

Aug 1, 2025

0.4.11

Aug 1, 2025

0.4.10

Aug 1, 2025

0.4.9

Aug 1, 2025

0.4.8

Aug 1, 2025

0.4.5

Aug 1, 2025

0.4.3

Aug 1, 2025

0.4.0

Aug 1, 2025

0.3.49

Jul 30, 2025

0.3.48

Jul 29, 2025

0.3.47

Jul 29, 2025

0.3.46

Jul 29, 2025

0.3.45

Jul 29, 2025

0.3.44

Jul 24, 2025

0.3.43

Jul 24, 2025

0.3.42

Jul 24, 2025

0.3.41

Jul 24, 2025

0.3.40

Jul 24, 2025

0.3.39

Jul 24, 2025

0.3.38

Jul 24, 2025

0.3.37

Jul 24, 2025

0.3.36

Jul 24, 2025

0.3.35

Jul 24, 2025

0.3.34

Jul 24, 2025

0.3.33

Jul 24, 2025

0.3.32

Jul 24, 2025

0.3.31

Jul 24, 2025

0.3.30

Jul 24, 2025

0.3.29

Jul 24, 2025

0.3.28

Jul 24, 2025

0.3.27

Jul 24, 2025

0.3.26

Jul 24, 2025

0.3.25

Jul 23, 2025

0.3.24

Jul 23, 2025

0.3.23

Jul 23, 2025

0.3.22

Jul 23, 2025

0.3.21

Jul 23, 2025

0.3.20

Jul 23, 2025

0.3.19

Jul 23, 2025

0.3.18

Jul 23, 2025

0.3.17

Jul 23, 2025

0.3.16

Jul 23, 2025

0.3.15

Jul 23, 2025

0.3.14

Jul 23, 2025

0.3.13

Jul 23, 2025

0.3.12

Jul 22, 2025

0.3.11

Jul 22, 2025

0.3.10

Jul 22, 2025

0.3.9

Jul 22, 2025

0.3.8

Jul 22, 2025

0.3.7

Jul 22, 2025

0.3.6

Jul 22, 2025

0.3.5

Jul 22, 2025

0.3.4

Jul 22, 2025

0.2.53

Jul 21, 2025

0.2.52

Jul 21, 2025

0.2.51

Jul 21, 2025

0.2.50

Jul 21, 2025

0.2.49

Jul 21, 2025

0.2.48

Jul 21, 2025

0.2.43

Jun 29, 2025

0.2.42

Jun 27, 2025

0.2.41

Jun 27, 2025

0.2.40

Jun 27, 2025

0.2.39

Jun 27, 2025

0.2.38

Jun 27, 2025

0.2.37

Jun 27, 2025

0.2.36

Jun 27, 2025

0.2.35

Jun 27, 2025

0.2.34

Jun 27, 2025

0.2.33

Jun 27, 2025

0.2.32

Jun 27, 2025

0.2.31

Jun 27, 2025

0.2.30

Jun 27, 2025

0.2.29

Jun 27, 2025

0.2.28

Jun 27, 2025

0.2.27

Jun 27, 2025

0.2.26

Jun 27, 2025

0.2.25

Jun 27, 2025

0.2.24

Jun 27, 2025

0.2.23

Jun 27, 2025

0.2.22

Jun 27, 2025

0.2.21

Jun 27, 2025

0.2.20

Jun 27, 2025

0.2.19

Jun 27, 2025

0.2.18

Jun 27, 2025

0.2.17

Jun 27, 2025

0.2.16

Jun 27, 2025

0.2.15

Jun 27, 2025

0.2.14

Jun 27, 2025

0.2.13

Jun 27, 2025

0.2.12

Jun 27, 2025

0.2.11

Jun 27, 2025

0.2.10

Jun 27, 2025

0.2.9

Jun 26, 2025

0.2.8

Jun 26, 2025

0.2.7

Jun 26, 2025

0.2.6

Jun 26, 2025

0.2.5

Jun 26, 2025

0.2.4

Jun 26, 2025

0.2.3

Jun 26, 2025

0.2.2

Jun 26, 2025

0.1.200

Jun 25, 2025

0.1.199

Jun 25, 2025

0.1.198

Jun 25, 2025

0.1.197

Jun 25, 2025

0.1.196

Jun 25, 2025

0.1.195

Jun 25, 2025

0.1.194

Jun 25, 2025

0.1.193

Jun 25, 2025

0.1.192

Jun 25, 2025

0.1.191

Jun 25, 2025

0.1.190

Jun 25, 2025

0.1.189

Jun 25, 2025

0.1.188

Jun 25, 2025

0.1.187

Jun 25, 2025

0.1.186

Jun 25, 2025

0.1.185

Jun 25, 2025

0.1.184

Jun 25, 2025

0.1.169

Jun 23, 2025

0.1.167

Jun 23, 2025

0.1.166

Jun 19, 2025

0.1.165

Jun 19, 2025

0.1.164

Jun 19, 2025

0.1.163

Jun 19, 2025

0.1.162

Jun 19, 2025

0.1.161

Jun 19, 2025

0.1.160

Jun 19, 2025

0.1.159

Jun 19, 2025

0.1.158

Jun 19, 2025

0.1.157

Jun 19, 2025

0.1.156

Jun 19, 2025

0.1.155

Jun 19, 2025

0.1.154

Jun 19, 2025

0.1.153

Jun 19, 2025

0.1.152

Jun 19, 2025

0.1.151

Jun 19, 2025

0.1.150

Jun 19, 2025

0.1.149

Jun 19, 2025

0.1.148

Jun 19, 2025

0.1.147

Jun 19, 2025

0.1.146

Jun 19, 2025

0.1.145

Jun 19, 2025

0.1.144

Jun 19, 2025

0.1.143

Jun 19, 2025

0.1.142

Jun 19, 2025

0.1.141

Jun 19, 2025

0.1.140

Jun 19, 2025

0.1.139

Jun 19, 2025

0.1.138

Jun 19, 2025

0.1.137

Jun 19, 2025

0.1.136

Jun 19, 2025

0.1.135

Jun 19, 2025

0.1.134

Jun 19, 2025

0.1.133

Jun 19, 2025

0.1.132

Jun 19, 2025

0.1.131

Jun 19, 2025

0.1.130

Jun 19, 2025

0.1.129

Jun 19, 2025

0.1.128

Jun 19, 2025

0.1.127

Jun 18, 2025

0.1.126

Jun 18, 2025

0.1.125

Jun 18, 2025

0.1.124

Jun 18, 2025

0.1.123

Jun 18, 2025

0.1.122

Jun 18, 2025

0.1.121

Jun 18, 2025

0.1.120

Jun 18, 2025

0.1.119

Jun 18, 2025

0.1.118

Jun 18, 2025

0.1.117

Jun 18, 2025

0.1.116

Jun 18, 2025

0.1.115

Jun 18, 2025

0.1.114

Jun 18, 2025

0.1.113

Jun 18, 2025

0.1.112

Jun 18, 2025

0.1.111

Jun 18, 2025

0.1.110

Jun 18, 2025

0.1.109

Jun 18, 2025

0.1.108

Jun 18, 2025

0.1.107

Jun 18, 2025

0.1.106

Jun 18, 2025

0.1.105

Jun 18, 2025

0.1.104

Jun 18, 2025

0.1.103

Jun 18, 2025

0.1.102

Jun 18, 2025

0.1.101

Jun 18, 2025

0.1.100

Jun 18, 2025

0.1.99

Jun 18, 2025

0.1.98

Jun 18, 2025

0.1.97

Jun 18, 2025

0.1.96

Jun 18, 2025

0.1.95

Jun 18, 2025

0.1.94

Jun 18, 2025

0.1.93

Jun 18, 2025

0.1.92

Jun 18, 2025

0.1.91

Jun 18, 2025

0.1.90

Jun 18, 2025

0.1.89

Jun 18, 2025

0.1.88

Jun 18, 2025

0.1.87

Jun 18, 2025

0.1.86

Jun 18, 2025

0.1.85

Jun 18, 2025

0.1.84

Jun 18, 2025

0.1.83

Jun 18, 2025

0.1.82

Jun 18, 2025

0.1.81

Jun 18, 2025

0.1.80

Jun 18, 2025

0.1.79

Jun 18, 2025

0.1.78

Jun 18, 2025

0.1.77

Jun 18, 2025

0.1.76

Jun 18, 2025

0.1.75

Jun 18, 2025

0.1.74

Jun 18, 2025

0.1.73

Jun 18, 2025

0.1.72

Jun 18, 2025

0.1.71

Jun 18, 2025

0.1.70

Jun 18, 2025

0.1.69

Jun 18, 2025

0.1.68

Jun 18, 2025

0.1.67

Jun 18, 2025

0.1.66

Jun 18, 2025

0.1.65

Jun 18, 2025

0.1.64

Jun 18, 2025

0.1.63

Jun 18, 2025

0.1.62

Jun 18, 2025

0.1.59

Jun 12, 2025

0.1.58

Jun 12, 2025

0.1.57

Jun 12, 2025

0.1.56

Jun 12, 2025

0.1.55

Jun 12, 2025

0.1.54

Jun 12, 2025

This version

0.1.53

Jun 12, 2025

0.1.52

Jun 12, 2025

0.1.51

Jun 12, 2025

0.1.50

Jun 12, 2025

0.1.49

Jun 11, 2025

0.1.48

Jun 11, 2025

0.1.47

Jun 11, 2025

0.1.46

Jun 11, 2025

0.1.45

Jun 11, 2025

0.1.44

Jun 11, 2025

0.1.43

Jun 11, 2025

0.1.26

Jun 11, 2025

0.1.16

Jun 11, 2025

0.1.15

Jun 11, 2025

0.1.14

Jun 11, 2025

0.1.13

Jun 11, 2025

0.1.12

Jun 11, 2025

0.1.11

Jun 11, 2025

0.1.10

Jun 11, 2025

0.1.8

Jun 11, 2025

0.1.7

Jun 11, 2025

0.1.6

Jun 11, 2025

0.1.5

Jun 11, 2025

0.1.4

Jun 11, 2025

0.1.3

Jun 11, 2025

0.1.2

Jun 11, 2025

0.1.1

Jun 11, 2025

0.1.0

Jun 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canonmap-0.1.53.tar.gz (24.3 kB view details)

Uploaded Jun 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

canonmap-0.1.53-py3-none-any.whl (28.2 kB view details)

Uploaded Jun 12, 2025 Python 3

File details

Details for the file canonmap-0.1.53.tar.gz.

File metadata

Download URL: canonmap-0.1.53.tar.gz
Upload date: Jun 12, 2025
Size: 24.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for canonmap-0.1.53.tar.gz
Algorithm	Hash digest
SHA256	`bc7d48a918dfec39877184fbf71ac9f02a77e46191b3ef54f8eb892a53099b45`
MD5	`04b144a3da6c25794d035c0b3b68b737`
BLAKE2b-256	`3939a3c28cc8d99fe2d25654b422276c359425bb72b6171cdd9ece8e733d56b7`

See more details on using hashes here.

File details

Details for the file canonmap-0.1.53-py3-none-any.whl.

File metadata

Download URL: canonmap-0.1.53-py3-none-any.whl
Upload date: Jun 12, 2025
Size: 28.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for canonmap-0.1.53-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ecb34832c4b681bf4adb15a9a06934a8b2ae32fd5d7d6e4e154c706b06ee8ec`
MD5	`5680f40431313330c1c8192afc8bfea8`
BLAKE2b-256	`ed6e46adb3d459875847b685893be2fb5193723796b3f8d75802c4c4b9230a62`

See more details on using hashes here.

canonmap 0.1.53

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CanonMap

Key Features

Installation

Dependencies

Quick Start

Advanced Usage

Custom Matching Weights

Field-Specific Matching

Features in Detail

Entity Extraction

Matching Process

Data Processing

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes