Local IAB Content Taxonomy 2.x -> 3.0 mapper with vectors, SCD, OpenRTB/VAST exporters.
Project description
IAB Content Taxonomy Mapper (Python)
View on PyPI • View on GitHub • Open Web Tool
Map IAB Content Taxonomy 2.x labels/codes to IAB 3.0 locally with a deterministic → fuzzy → (optional) semantic pipeline.
This is the Python implementation. For JavaScript/TypeScript, see
@mixpeek/iab-mapper.
🔧 Install
From PyPI (recommended)
pip install iab-mapper
From source
cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Optional (enable local embeddings / KNN search)
pip install -e ".[emb]"
🚀 Quick Start
# simplest path: fuzzy only, CSV in → JSON out
iab-mapper sample_2x_codes.csv -o mapped.json
# enable local embeddings (improves recall on free‑text labels)
iab-mapper sample_2x_codes.csv -o mapped.json --use-embeddings
🐍 Python API
from pathlib import Path
from iab_mapper.pipeline import Mapper, MapConfig
import iab_mapper as pkg
# Use packaged stub catalogs or point data_dir to your own
data_dir = Path(pkg.__file__).parent / "data"
cfg = MapConfig(
fuzzy_method="bm25", # rapidfuzz|tfidf|bm25
fuzzy_cut=0.92,
use_embeddings=False, # set True and choose emb_model to enable
max_topics=3,
drop_scd=False,
cattax="2", # OpenRTB content.cattax enum
overrides_path=None # path to JSON overrides if desired
)
mapper = Mapper(cfg, str(data_dir))
# Single record with optional vectors
rec = {
"code": "2-12",
"label": "Food & Drink",
"channel": "editorial",
"type": "article",
"format": "video",
"language": "en",
"source": "professional",
"environment": "ctv",
}
out = mapper.map_record(rec)
print(out["out_ids"]) # topic + vector IDs
print(out["openrtb"]) # {"content": {"cat": [...], "cattax": "2"}}
print(out["vast_contentcat"]) # "id1","id2",...
# Or just map topics
topics = mapper.map_topics("Cooking how-to")
# Batch over a list of dicts
rows = [rec, {"label": "Sports"}]
mapped = [mapper.map_record(r) for r in rows]
⚙️ Useful Flags
| Flag | Default | What it does |
|---|---|---|
--fuzzy-cut |
0.92 |
Stricter = fewer, higher-confidence matches |
--use-embeddings |
off | Enable local embeddings for near-miss labels |
--emb-model |
all-MiniLM-L6-v2 |
Sentence-Transformers model or tfidf |
--emb-cut |
0.80 |
Cosine similarity threshold for embeddings |
--max-topics |
3 |
Cap topic IDs per row |
--drop-scd |
off | Exclude Sensitive Content nodes |
--cattax |
2 |
OpenRTB content.cattax enum |
--unmapped-out |
— | Write misses to file for audit |
--overrides |
— | Force mappings before match |
🖥️ Web Demo
cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install -r requirements-dev.txt
uvicorn scripts.web_server:app --port 8000 --reload
📜 License
BSD 2-Clause. See LICENSE.
For full documentation, see the main README.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iab_mapper-1.0.0.tar.gz.
File metadata
- Download URL: iab_mapper-1.0.0.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c7f2c388fd8eee2ba4ad0548220842211c9b355ec3d309e857723a861f5ac2d
|
|
| MD5 |
8c7463a01ff4205e7f53dadbb86b496f
|
|
| BLAKE2b-256 |
35ab6a79bc5995f492043927bdd05a8ca39fe2578388b665616e37fd770d96e1
|
File details
Details for the file iab_mapper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: iab_mapper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 41.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f18fc31057abdeb9a85d48aab75a03dc14dc9aa8a65d3036f3b965e1e9d6dc4
|
|
| MD5 |
2f2264f33914481585f90519cffacf87
|
|
| BLAKE2b-256 |
4f7958a32da0cc6d4de3055984e0e14b98c37b6b014197e600189cab08af34ca
|