Inference-driven schema mapping engine
Project description
infermap
Inference-driven schema mapping engine.
Map messy source columns to a known target schema — accurately, explainably, with zero config.
📖 Wiki · 🌐 Docs · 🧪 Examples · 💬 Discussions · 🐛 Issues
infermap is a schema-mapping engine. Give it any two field collections (CSVs, DataFrames, database tables, in-memory records) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Available as a Python package on PyPI and a TypeScript package on npm, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.
Table of contents
- Install
- Quick start
- How it works
- Features
- Which package should I use?
- Custom scorers
- CLI examples
- Config reference
- Documentation
- License
Install
Python
pip install infermap
Optional database extras:
pip install infermap[postgres] # psycopg2-binary
pip install infermap[mysql] # mysql-connector-python
pip install infermap[duckdb] # duckdb
pip install infermap[all] # all extras
TypeScript / Next.js
npm install infermap
Zero runtime dependencies in the core entrypoint. Compatible with Next.js Server Components, Route Handlers, Server Actions, and the Edge Runtime out of the box. See the package README for the full reference.
Quick start
Python
import infermap
# Map a CRM export CSV to a canonical customer schema
result = infermap.map("crm_export.csv", "canonical_customers.csv")
for m in result.mappings:
print(f"{m.source} -> {m.target} ({m.confidence:.0%})")
# fname -> first_name (97%)
# lname -> last_name (95%)
# email_addr -> email (91%)
# Apply mappings to rename DataFrame columns
import polars as pl
df = pl.read_csv("crm_export.csv")
renamed = result.apply(df)
# Save mappings to a reusable config file
result.to_config("my_mapping.yaml")
# Reload later — no re-inference needed
saved = infermap.from_config("my_mapping.yaml")
TypeScript
import { map } from "infermap";
const crm = [
{ fname: "John", lname: "Doe", email_addr: "j@d.co" },
{ fname: "Jane", lname: "Smith", email_addr: "j@s.co" },
];
const canonical = [
{ first_name: "", last_name: "", email: "" },
];
const result = map({ records: crm }, { records: canonical });
for (const m of result.mappings) {
console.log(`${m.source} → ${m.target} (${m.confidence.toFixed(2)})`);
}
// fname → first_name (0.44)
// lname → last_name (0.48)
// email_addr → email (0.69)
For Next.js, drop it directly into a Route Handler — works on Edge Runtime with zero config:
// app/api/infer/route.ts
import { map } from "infermap";
export const runtime = "edge";
export async function POST(req: Request) {
const { sourceCsv, targetCsv } = await req.json();
const result = map({ csvText: sourceCsv }, { csvText: targetCsv });
return Response.json(result);
}
How it works
Each field pair runs through a pipeline of 6 scorers. Each scorer returns a score in [0.0, 1.0] or abstains (None/null). The engine combines scores via weighted average (requiring at least 2 contributors), then uses the Hungarian algorithm for optimal one-to-one assignment.
| Scorer | Weight | What it detects |
|---|---|---|
| ExactScorer | 1.0 | Case-insensitive exact name match |
| AliasScorer | 0.95 | Known field aliases (fname ↔ first_name, tel ↔ phone) |
| PatternTypeScorer | 0.7 | Semantic type from sample values — email, date_iso, phone, uuid, url, zip, currency |
| ProfileScorer | 0.5 | Statistical profile similarity — dtype, null rate, unique rate, length, cardinality |
| FuzzyNameScorer | 0.4 | Jaro-Winkler similarity on normalized field names |
| LLMScorer | 0.8 | Pluggable LLM-backed scorer (stubbed by default) |
Features
| Python | TypeScript | |
|---|---|---|
| 6 built-in scorers | ✅ | ✅ |
| Hungarian assignment | ✅ (scipy) | ✅ (vendored) |
| Custom scorers | @infermap.scorer |
defineScorer() |
| In-memory data | Polars, Pandas, list[dict] |
Array<Record> |
| File providers | CSV, Parquet, XLSX | CSV, JSON |
| Schema definition files | YAML + JSON | JSON |
| Database providers | SQLite, Postgres, DuckDB | SQLite, Postgres, DuckDB |
| Engine config | YAML | JSON |
| Saved mapping format | YAML | JSON |
| CLI | ✅ (Typer) | ✅ (node:util) |
| Apply to DataFrame | ✅ | ❌ (CSV rewrite via CLI) |
| Edge-runtime compatible | ❌ | ✅ |
| Zero runtime deps | n/a | ✅ |
Which package should I use?
| If you are… | Use |
|---|---|
| Building a Python data pipeline or notebook | Python |
| Building a Next.js app, Node service, or browser tool | TypeScript |
| Running mapping in a serverless edge function | TypeScript (zero Node built-ins) |
| Doing ad-hoc CSV exploration on the command line | Python CLI has more features; TS CLI is leaner |
| Both — Python backend + Next.js admin UI | Both — outputs are interoperable via the JSON config format |
Custom scorers
Python
import infermap
from infermap.types import FieldInfo, ScorerResult
@infermap.scorer("prefix_scorer", weight=0.8)
def prefix_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
if source.name[:3].lower() != target.name[:3].lower():
return None
return ScorerResult(score=0.85, reasoning=f"Shared prefix '{source.name[:3]}'")
from infermap.engine import MapEngine
from infermap.scorers import default_scorers
engine = MapEngine(scorers=[*default_scorers(), prefix_scorer])
TypeScript
import { MapEngine, defaultScorers, defineScorer, makeScorerResult } from "infermap";
const prefixScorer = defineScorer(
"prefix_scorer",
(source, target) => {
if (source.name.slice(0, 3).toLowerCase() !== target.name.slice(0, 3).toLowerCase()) {
return null;
}
return makeScorerResult(0.85, `Shared prefix '${source.name.slice(0, 3)}'`);
},
0.8 // weight
);
const engine = new MapEngine({
scorers: [...defaultScorers(), prefixScorer],
});
CLI examples
The CLI works the same way in both packages:
# Map two files and print a report
infermap map crm_export.csv canonical_customers.csv
# Map and save the config (Python: --save, TS: -o)
infermap map crm_export.csv canonical_customers.csv -o mapping.json
# Apply a saved mapping to rename columns
infermap apply crm_export.csv --config mapping.json --output renamed.csv
# Inspect the schema of a file or DB table
infermap inspect crm_export.csv
infermap inspect "sqlite:///mydb.db" --table customers
# Validate a saved config against a source
infermap validate crm_export.csv --config mapping.json --required email,id --strict
Config reference
Both packages accept an engine config (scorer weight overrides + alias extensions). Python uses YAML, TypeScript uses JSON; the shape is identical.
# Python: infermap.yaml
scorers:
LLMScorer:
enabled: false
FuzzyNameScorer:
weight: 0.3
aliases:
order_id:
- order_num
- ord_no
// TypeScript: infermap.config.json
{
"scorers": {
"LLMScorer": { "enabled": false },
"FuzzyNameScorer": { "weight": 0.3 }
},
"aliases": {
"order_id": ["order_num", "ord_no"]
}
}
See infermap.yaml.example for a full annotated reference.
Documentation
- 📖 Wiki — full reference for both languages
- Getting Started
- Python API
- TypeScript API
- Python vs TypeScript — migration guide
- Scorers
- Architecture
- FAQ
- 🌐 Documentation site
- 🧪 Examples
- Python examples — 7 numbered scripts + sample data
- TypeScript examples — basic mapping, Next.js Edge Runtime, custom scorer, SQLite, save/reuse
- 📓 Open in Colab — Python notebook
- 💬 GitHub Discussions
- 🐛 Issue tracker
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file infermap-0.2.0.tar.gz.
File metadata
- Download URL: infermap-0.2.0.tar.gz
- Upload date:
- Size: 247.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e16a20242840f7e74d2403b778670b3a1c03230334d3a6df34ca63acc989807c
|
|
| MD5 |
c805761d461feb95022f9d27bdbc1e30
|
|
| BLAKE2b-256 |
8ee534e8b9a0f0396079f46b9dc8c36bfb2a23a73458f7b3b7fc511e40a26b8b
|
Provenance
The following attestation bundles were made for infermap-0.2.0.tar.gz:
Publisher:
publish.yml on benzsevern/infermap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infermap-0.2.0.tar.gz -
Subject digest:
e16a20242840f7e74d2403b778670b3a1c03230334d3a6df34ca63acc989807c - Sigstore transparency entry: 1265391982
- Sigstore integration time:
-
Permalink:
benzsevern/infermap@8f347ebc35840ec00546e83ec1c8c8a7c11a867a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/benzsevern
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8f347ebc35840ec00546e83ec1c8c8a7c11a867a -
Trigger Event:
release
-
Statement type:
File details
Details for the file infermap-0.2.0-py3-none-any.whl.
File metadata
- Download URL: infermap-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebf1dc71e4b02948525dcd7522681310e07a5a8f45a5fe129835b23af3dff17f
|
|
| MD5 |
338d9f588f3802e3c702e41e04bb73f1
|
|
| BLAKE2b-256 |
cfde19c00ccd0bd20b88317e43b2fab8b85bf715c988bf3fa7f08af10f0f940e
|
Provenance
The following attestation bundles were made for infermap-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on benzsevern/infermap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infermap-0.2.0-py3-none-any.whl -
Subject digest:
ebf1dc71e4b02948525dcd7522681310e07a5a8f45a5fe129835b23af3dff17f - Sigstore transparency entry: 1265392062
- Sigstore integration time:
-
Permalink:
benzsevern/infermap@8f347ebc35840ec00546e83ec1c8c8a7c11a867a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/benzsevern
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8f347ebc35840ec00546e83ec1c8c8a7c11a867a -
Trigger Event:
release
-
Statement type: