Skip to main content

Inference-driven schema mapping engine

Project description

PyPI CI Python 3.11+ License: MIT

infermap

Inference-driven schema mapping engine — automatically maps source fields to target fields using a composable scorer pipeline.

Install

pip install infermap

Install extras for additional database support:

pip install infermap[postgres]   # psycopg2-binary
pip install infermap[mysql]      # mysql-connector-python
pip install infermap[duckdb]     # duckdb
pip install infermap[all]        # all extras

Quick Start

import infermap

# Map a CRM export CSV to a canonical customer schema
result = infermap.map("crm_export.csv", "canonical_customers.csv")

for m in result.mappings:
    print(f"{m.source} -> {m.target}  ({m.confidence:.0%})")
# fname -> first_name  (97%)
# lname -> last_name   (95%)
# email_addr -> email  (91%)

# Apply mappings to rename DataFrame columns
import polars as pl
df = pl.read_csv("crm_export.csv")
renamed = result.apply(df)

# Save mappings to a reusable config file
result.to_config("my_mapping.yaml")

# Reload later — no re-inference needed
saved = infermap.from_config("my_mapping.yaml")

CLI Examples

# Map two files and print a report
infermap map crm_export.csv canonical_customers.csv

# Map and save the config
infermap map crm_export.csv canonical_customers.csv --save mapping.yaml

# Apply a saved mapping config to a DataFrame (prints renamed column list)
infermap apply crm_export.csv mapping.yaml

# Inspect the schema of a file or database table
infermap inspect crm_export.csv
infermap inspect sqlite:///mydb.db --table customers

# Validate a mapping config file
infermap validate mapping.yaml

How It Works

infermap runs each field pair through a pipeline of 5 scorers. Each scorer returns a score between 0.0 and 1.0 (or abstains with None). The engine combines scores via weighted average (requiring at least 2 contributing scorers), then uses the Hungarian algorithm for optimal one-to-one assignment.

Scorer Weight What it detects
ExactScorer 1.0 Case-insensitive exact name match
AliasScorer 0.9 Known field aliases (e.g. fname == first_name, tel == phone)
PatternTypeScorer 0.7 Semantic type from sample values — email, date_iso, phone, uuid, url, zip, currency
ProfileScorer 0.6 Statistical profile similarity — null rate, unique rate, value count
FuzzyNameScorer 0.5 Token-level fuzzy string similarity on field names

Features

  • Maps CSV, Parquet, XLSX, Polars DataFrames, Pandas DataFrames, SQLite, and schema YAML files
  • Composable scorer pipeline — disable, reweight, or add custom scorers via config or code
  • Optimal one-to-one assignment via the Hungarian algorithm
  • required parameter warns when critical target fields go unmapped
  • MapResult.apply() renames DataFrame columns in one call
  • to_config() / from_config() roundtrip for repeatable pipelines
  • CLI for quick inspection, mapping, and validation

Custom Scorers

Register a scorer function with the @infermap.scorer decorator:

import infermap
from infermap.types import FieldInfo, ScorerResult

@infermap.scorer("my_prefix_scorer", weight=0.8)
def my_prefix_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
    src = source.name.lower()
    tgt = target.name.lower()
    # Abstain if neither name starts with a common prefix
    if not (src[:3] == tgt[:3]):
        return None
    return ScorerResult(score=0.85, reasoning=f"Shared prefix '{src[:3]}'")

from infermap.engine import MapEngine
from infermap.scorers import default_scorers

engine = MapEngine(scorers=[*default_scorers(), my_prefix_scorer])
result = engine.map("source.csv", "target.csv")

You can also use a plain class with name, weight, and score():

class DomainScorer:
    name = "DomainScorer"
    weight = 0.75

    def score(self, source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
        ...

Config Reference

Load an infermap.yaml at engine creation to override scorer weights, disable scorers, or add domain aliases:

engine = MapEngine(config_path="infermap.yaml")

See infermap.yaml.example for a full annotated example.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infermap-0.1.0.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

infermap-0.1.0-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file infermap-0.1.0.tar.gz.

File metadata

  • Download URL: infermap-0.1.0.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for infermap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cc29ca1486ecbdb765b47b5622ce12f8901bdc176fdf59ffdb9c14f762f1131f
MD5 fdd1d539730a7be7a633d0acf4ad5165
BLAKE2b-256 fdd6b83ce6c02db03f1349b790ef850cc0c391df32ca6afa3f5c20e25b323f41

See more details on using hashes here.

File details

Details for the file infermap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: infermap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for infermap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07ad0478388f06fbfce4ed25f04825b1b4c946aace408bc9b3d04124e1a8018e
MD5 cfc16a03392575df9e81e575020b3f19
BLAKE2b-256 009e205c7d0e491f5abe22bbb849c529d2ee00442675162f4be8f49d4029d04e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page