Inference-driven schema mapping engine

These details have not been verified by PyPI

Project description

infermap

Inference-driven schema mapping engine.
Map messy source columns to a known target schema — accurately, explainably, with zero config.

📖 Wiki · 🌐 Docs · 🧪 Examples · 💬 Discussions · 🐛 Issues

infermap is a schema-mapping engine. Give it any two field collections (CSVs, DataFrames, database tables, in-memory records) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Available as a Python package on PyPI and a TypeScript package on npm, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.

Install
Quick start
How it works
Features
Which package should I use?
Custom scorers
CLI examples
Config reference
Documentation
License

Install

Python

pip install infermap

Optional database extras:

pip install infermap[postgres]   # psycopg2-binary
pip install infermap[mysql]      # mysql-connector-python
pip install infermap[duckdb]     # duckdb
pip install infermap[all]        # all extras

TypeScript / Next.js

npm install infermap

Zero runtime dependencies in the core entrypoint. Compatible with Next.js Server Components, Route Handlers, Server Actions, and the Edge Runtime out of the box. See the package README for the full reference.

Quick start

Python

import infermap

# Map a CRM export CSV to a canonical customer schema
result = infermap.map("crm_export.csv", "canonical_customers.csv")

for m in result.mappings:
    print(f"{m.source} -> {m.target}  ({m.confidence:.0%})")
# fname -> first_name  (97%)
# lname -> last_name   (95%)
# email_addr -> email  (91%)

# Apply mappings to rename DataFrame columns
import polars as pl
df = pl.read_csv("crm_export.csv")
renamed = result.apply(df)

# Save mappings to a reusable config file
result.to_config("my_mapping.yaml")

# Reload later — no re-inference needed
saved = infermap.from_config("my_mapping.yaml")

TypeScript

import { map } from "infermap";

const crm = [
  { fname: "John", lname: "Doe", email_addr: "j@d.co" },
  { fname: "Jane", lname: "Smith", email_addr: "j@s.co" },
];

const canonical = [
  { first_name: "", last_name: "", email: "" },
];

const result = map({ records: crm }, { records: canonical });

for (const m of result.mappings) {
  console.log(`${m.source} → ${m.target}  (${m.confidence.toFixed(2)})`);
}
// fname       → first_name  (0.44)
// lname       → last_name   (0.48)
// email_addr  → email       (0.69)

For Next.js, drop it directly into a Route Handler — works on Edge Runtime with zero config:

// app/api/infer/route.ts
import { map } from "infermap";
export const runtime = "edge";

export async function POST(req: Request) {
  const { sourceCsv, targetCsv } = await req.json();
  const result = map({ csvText: sourceCsv }, { csvText: targetCsv });
  return Response.json(result);
}

How it works

Each field pair runs through a pipeline of 6 scorers. Each scorer returns a score in [0.0, 1.0] or abstains (None/null). The engine combines scores via weighted average (requiring at least 2 contributors), then uses the Hungarian algorithm for optimal one-to-one assignment.

Scorer	Weight	What it detects
ExactScorer	1.0	Case-insensitive exact name match
AliasScorer	0.95	Known field aliases (`fname` ↔ `first_name`, `tel` ↔ `phone`)
PatternTypeScorer	0.7	Semantic type from sample values — email, date_iso, phone, uuid, url, zip, currency
ProfileScorer	0.5	Statistical profile similarity — dtype, null rate, unique rate, length, cardinality
FuzzyNameScorer	0.4	Jaro-Winkler similarity on normalized field names
LLMScorer	0.8	Pluggable LLM-backed scorer (stubbed by default)

Read the full architecture →

Features

	Python	TypeScript
6 built-in scorers	✅	✅
Hungarian assignment	✅ (scipy)	✅ (vendored)
Custom scorers	`@infermap.scorer`	`defineScorer()`
In-memory data	Polars, Pandas, `list[dict]`	`Array<Record>`
File providers	CSV, Parquet, XLSX	CSV, JSON
Schema definition files	YAML + JSON	JSON
Database providers	SQLite, Postgres, DuckDB	SQLite, Postgres, DuckDB
Engine config	YAML	JSON
Saved mapping format	YAML	JSON
CLI	✅ (Typer)	✅ (`node:util`)
Apply to DataFrame	✅	❌ (CSV rewrite via CLI)
Edge-runtime compatible	❌	✅
Zero runtime deps	n/a	✅

Full feature parity matrix →

Which package should I use?

If you are…	Use
Building a Python data pipeline or notebook	Python
Building a Next.js app, Node service, or browser tool	TypeScript
Running mapping in a serverless edge function	TypeScript (zero Node built-ins)
Doing ad-hoc CSV exploration on the command line	Python CLI has more features; TS CLI is leaner
Both — Python backend + Next.js admin UI	Both — outputs are interoperable via the JSON config format

Custom scorers

Python

import infermap
from infermap.types import FieldInfo, ScorerResult

@infermap.scorer("prefix_scorer", weight=0.8)
def prefix_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
    if source.name[:3].lower() != target.name[:3].lower():
        return None
    return ScorerResult(score=0.85, reasoning=f"Shared prefix '{source.name[:3]}'")

from infermap.engine import MapEngine
from infermap.scorers import default_scorers

engine = MapEngine(scorers=[*default_scorers(), prefix_scorer])

TypeScript

import { MapEngine, defaultScorers, defineScorer, makeScorerResult } from "infermap";

const prefixScorer = defineScorer(
  "prefix_scorer",
  (source, target) => {
    if (source.name.slice(0, 3).toLowerCase() !== target.name.slice(0, 3).toLowerCase()) {
      return null;
    }
    return makeScorerResult(0.85, `Shared prefix '${source.name.slice(0, 3)}'`);
  },
  0.8 // weight
);

const engine = new MapEngine({
  scorers: [...defaultScorers(), prefixScorer],
});

CLI examples

The CLI works the same way in both packages:

# Map two files and print a report
infermap map crm_export.csv canonical_customers.csv

# Map and save the config (Python: --save, TS: -o)
infermap map crm_export.csv canonical_customers.csv -o mapping.json

# Apply a saved mapping to rename columns
infermap apply crm_export.csv --config mapping.json --output renamed.csv

# Inspect the schema of a file or DB table
infermap inspect crm_export.csv
infermap inspect "sqlite:///mydb.db" --table customers

# Validate a saved config against a source
infermap validate crm_export.csv --config mapping.json --required email,id --strict

Config reference

Both packages accept an engine config (scorer weight overrides + alias extensions). Python uses YAML, TypeScript uses JSON; the shape is identical.

# Python: infermap.yaml
scorers:
  LLMScorer:
    enabled: false
  FuzzyNameScorer:
    weight: 0.3
aliases:
  order_id:
    - order_num
    - ord_no

// TypeScript: infermap.config.json
{
  "scorers": {
    "LLMScorer":       { "enabled": false },
    "FuzzyNameScorer": { "weight": 0.3 }
  },
  "aliases": {
    "order_id": ["order_num", "ord_no"]
  }
}

See infermap.yaml.example for a full annotated reference.

Documentation

📖 Wiki — full reference for both languages
- Getting Started
- Python API
- TypeScript API
- Python vs TypeScript — migration guide
- Scorers
- Architecture
- FAQ
🌐 Documentation site
🧪 Examples
- Python examples — 7 numbered scripts + sample data
- TypeScript examples — basic mapping, Next.js Edge Runtime, custom scorer, SQLite, save/reuse
📓 Open in Colab — Python notebook
💬 GitHub Discussions
🐛 Issue tracker

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

May 6, 2026

0.3.2

Apr 10, 2026

0.3.1

Apr 10, 2026

0.3.0

Apr 10, 2026

This version

0.2.0

Apr 9, 2026

0.1.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infermap-0.2.0.tar.gz (247.7 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

infermap-0.2.0-py3-none-any.whl (30.9 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file infermap-0.2.0.tar.gz.

File metadata

Download URL: infermap-0.2.0.tar.gz
Upload date: Apr 9, 2026
Size: 247.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for infermap-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e16a20242840f7e74d2403b778670b3a1c03230334d3a6df34ca63acc989807c`
MD5	`c805761d461feb95022f9d27bdbc1e30`
BLAKE2b-256	`8ee534e8b9a0f0396079f46b9dc8c36bfb2a23a73458f7b3b7fc511e40a26b8b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for infermap-0.2.0.tar.gz:

Publisher: publish.yml on benzsevern/infermap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: infermap-0.2.0.tar.gz
- Subject digest: e16a20242840f7e74d2403b778670b3a1c03230334d3a6df34ca63acc989807c
- Sigstore transparency entry: 1265391982
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: benzsevern/infermap@8f347ebc35840ec00546e83ec1c8c8a7c11a867a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/benzsevern
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8f347ebc35840ec00546e83ec1c8c8a7c11a867a
- Trigger Event: release

File details

Details for the file infermap-0.2.0-py3-none-any.whl.

File metadata

Download URL: infermap-0.2.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 30.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for infermap-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ebf1dc71e4b02948525dcd7522681310e07a5a8f45a5fe129835b23af3dff17f`
MD5	`338d9f588f3802e3c702e41e04bb73f1`
BLAKE2b-256	`cfde19c00ccd0bd20b88317e43b2fab8b85bf715c988bf3fa7f08af10f0f940e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for infermap-0.2.0-py3-none-any.whl:

Publisher: publish.yml on benzsevern/infermap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: infermap-0.2.0-py3-none-any.whl
- Subject digest: ebf1dc71e4b02948525dcd7522681310e07a5a8f45a5fe129835b23af3dff17f
- Sigstore transparency entry: 1265392062
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: benzsevern/infermap@8f347ebc35840ec00546e83ec1c8c8a7c11a867a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/benzsevern
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8f347ebc35840ec00546e83ec1c8c8a7c11a867a
- Trigger Event: release

infermap 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

infermap

Table of contents

Install

Python

TypeScript / Next.js

Quick start

Python

TypeScript

How it works

Features

Which package should I use?

Custom scorers

Python

TypeScript

CLI examples

Config reference

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance