Skip to main content

Local IAB Content Taxonomy 2.x -> 3.0 mapper with vectors, SCD, OpenRTB/VAST exporters.

Project description

IAB Content Taxonomy Mapper (Python)

View on PyPIView on GitHubOpen Web Tool

Map IAB Content Taxonomy 2.x labels/codes to IAB 3.0 locally with a deterministic → fuzzy → (optional) semantic pipeline.

This is the Python implementation. For JavaScript/TypeScript, see @mixpeek/iab-mapper.

🔧 Install

From PyPI (recommended)

pip install iab-mapper

From source

cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Optional (enable local embeddings / KNN search)
pip install -e ".[emb]"

🚀 Quick Start

# simplest path: fuzzy only, CSV in → JSON out
iab-mapper sample_2x_codes.csv -o mapped.json

# enable local embeddings (improves recall on free‑text labels)
iab-mapper sample_2x_codes.csv -o mapped.json --use-embeddings

🐍 Python API

from pathlib import Path
from iab_mapper.pipeline import Mapper, MapConfig
import iab_mapper as pkg

# Use packaged stub catalogs or point data_dir to your own
data_dir = Path(pkg.__file__).parent / "data"

cfg = MapConfig(
    fuzzy_method="bm25",   # rapidfuzz|tfidf|bm25
    fuzzy_cut=0.92,
    use_embeddings=False,   # set True and choose emb_model to enable
    max_topics=3,
    drop_scd=False,
    cattax="2",            # OpenRTB content.cattax enum
    overrides_path=None     # path to JSON overrides if desired
)

mapper = Mapper(cfg, str(data_dir))

# Single record with optional vectors
rec = {
    "code": "2-12",
    "label": "Food & Drink",
    "channel": "editorial",
    "type": "article",
    "format": "video",
    "language": "en",
    "source": "professional",
    "environment": "ctv",
}

out = mapper.map_record(rec)
print(out["out_ids"])         # topic + vector IDs
print(out["openrtb"])         # {"content": {"cat": [...], "cattax": "2"}}
print(out["vast_contentcat"]) # "id1","id2",...

# Or just map topics
topics = mapper.map_topics("Cooking how-to")

# Batch over a list of dicts
rows = [rec, {"label": "Sports"}]
mapped = [mapper.map_record(r) for r in rows]

⚙️ Useful Flags

Flag Default What it does
--fuzzy-cut 0.92 Stricter = fewer, higher-confidence matches
--use-embeddings off Enable local embeddings for near-miss labels
--emb-model all-MiniLM-L6-v2 Sentence-Transformers model or tfidf
--emb-cut 0.80 Cosine similarity threshold for embeddings
--max-topics 3 Cap topic IDs per row
--drop-scd off Exclude Sensitive Content nodes
--cattax 2 OpenRTB content.cattax enum
--unmapped-out Write misses to file for audit
--overrides Force mappings before match

🖥️ Web Demo

cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install -r requirements-dev.txt
uvicorn scripts.web_server:app --port 8000 --reload

Open http://localhost:8000/

📜 License

BSD 2-Clause. See LICENSE.

For full documentation, see the main README.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iab_mapper-1.0.0.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iab_mapper-1.0.0-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file iab_mapper-1.0.0.tar.gz.

File metadata

  • Download URL: iab_mapper-1.0.0.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for iab_mapper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3c7f2c388fd8eee2ba4ad0548220842211c9b355ec3d309e857723a861f5ac2d
MD5 8c7463a01ff4205e7f53dadbb86b496f
BLAKE2b-256 35ab6a79bc5995f492043927bdd05a8ca39fe2578388b665616e37fd770d96e1

See more details on using hashes here.

File details

Details for the file iab_mapper-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: iab_mapper-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for iab_mapper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f18fc31057abdeb9a85d48aab75a03dc14dc9aa8a65d3036f3b965e1e9d6dc4
MD5 2f2264f33914481585f90519cffacf87
BLAKE2b-256 4f7958a32da0cc6d4de3055984e0e14b98c37b6b014197e600189cab08af34ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page