Inference-driven schema mapping engine
Project description
infermap
Inference-driven schema mapping engine — automatically maps source fields to target fields using a composable scorer pipeline.
Install
pip install infermap
Install extras for additional database support:
pip install infermap[postgres] # psycopg2-binary
pip install infermap[mysql] # mysql-connector-python
pip install infermap[duckdb] # duckdb
pip install infermap[all] # all extras
Quick Start
import infermap
# Map a CRM export CSV to a canonical customer schema
result = infermap.map("crm_export.csv", "canonical_customers.csv")
for m in result.mappings:
print(f"{m.source} -> {m.target} ({m.confidence:.0%})")
# fname -> first_name (97%)
# lname -> last_name (95%)
# email_addr -> email (91%)
# Apply mappings to rename DataFrame columns
import polars as pl
df = pl.read_csv("crm_export.csv")
renamed = result.apply(df)
# Save mappings to a reusable config file
result.to_config("my_mapping.yaml")
# Reload later — no re-inference needed
saved = infermap.from_config("my_mapping.yaml")
CLI Examples
# Map two files and print a report
infermap map crm_export.csv canonical_customers.csv
# Map and save the config
infermap map crm_export.csv canonical_customers.csv --save mapping.yaml
# Apply a saved mapping config to a DataFrame (prints renamed column list)
infermap apply crm_export.csv mapping.yaml
# Inspect the schema of a file or database table
infermap inspect crm_export.csv
infermap inspect sqlite:///mydb.db --table customers
# Validate a mapping config file
infermap validate mapping.yaml
How It Works
infermap runs each field pair through a pipeline of 5 scorers. Each scorer returns a score between 0.0 and 1.0 (or abstains with None). The engine combines scores via weighted average (requiring at least 2 contributing scorers), then uses the Hungarian algorithm for optimal one-to-one assignment.
| Scorer | Weight | What it detects |
|---|---|---|
| ExactScorer | 1.0 | Case-insensitive exact name match |
| AliasScorer | 0.9 | Known field aliases (e.g. fname == first_name, tel == phone) |
| PatternTypeScorer | 0.7 | Semantic type from sample values — email, date_iso, phone, uuid, url, zip, currency |
| ProfileScorer | 0.6 | Statistical profile similarity — null rate, unique rate, value count |
| FuzzyNameScorer | 0.5 | Token-level fuzzy string similarity on field names |
Features
- Maps CSV, Parquet, XLSX, Polars DataFrames, Pandas DataFrames, SQLite, and schema YAML files
- Composable scorer pipeline — disable, reweight, or add custom scorers via config or code
- Optimal one-to-one assignment via the Hungarian algorithm
requiredparameter warns when critical target fields go unmappedMapResult.apply()renames DataFrame columns in one callto_config()/from_config()roundtrip for repeatable pipelines- CLI for quick inspection, mapping, and validation
Custom Scorers
Register a scorer function with the @infermap.scorer decorator:
import infermap
from infermap.types import FieldInfo, ScorerResult
@infermap.scorer("my_prefix_scorer", weight=0.8)
def my_prefix_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
src = source.name.lower()
tgt = target.name.lower()
# Abstain if neither name starts with a common prefix
if not (src[:3] == tgt[:3]):
return None
return ScorerResult(score=0.85, reasoning=f"Shared prefix '{src[:3]}'")
from infermap.engine import MapEngine
from infermap.scorers import default_scorers
engine = MapEngine(scorers=[*default_scorers(), my_prefix_scorer])
result = engine.map("source.csv", "target.csv")
You can also use a plain class with name, weight, and score():
class DomainScorer:
name = "DomainScorer"
weight = 0.75
def score(self, source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
...
Config Reference
Load an infermap.yaml at engine creation to override scorer weights, disable scorers, or add domain aliases:
engine = MapEngine(config_path="infermap.yaml")
See infermap.yaml.example for a full annotated example.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file infermap-0.1.0.tar.gz.
File metadata
- Download URL: infermap-0.1.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc29ca1486ecbdb765b47b5622ce12f8901bdc176fdf59ffdb9c14f762f1131f
|
|
| MD5 |
fdd1d539730a7be7a633d0acf4ad5165
|
|
| BLAKE2b-256 |
fdd6b83ce6c02db03f1349b790ef850cc0c391df32ca6afa3f5c20e25b323f41
|
File details
Details for the file infermap-0.1.0-py3-none-any.whl.
File metadata
- Download URL: infermap-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07ad0478388f06fbfce4ed25f04825b1b4c946aace408bc9b3d04124e1a8018e
|
|
| MD5 |
cfc16a03392575df9e81e575020b3f19
|
|
| BLAKE2b-256 |
009e205c7d0e491f5abe22bbb849c529d2ee00442675162f4be8f49d4029d04e
|