Rule-based web ad/clutter eraser, learned from a crowd-labeled dataset of page elements.

These details have not been verified by PyPI

Project links

Project description

🪄 magic-eraser

People who don't know what to sell, sell advertisements.

Rule-based web ad/clutter eraser, learned from a continuously-growing, crowd-labeled dataset of real page elements. No LLM needed at inference time — the rules are distilled from labels that an LLM (or a human) produced once.

pip install magic-eraser

from magic_eraser import is_ad, css, detect_ads, AdEraser

is_ad({"cls": "ad-slot leaderboard", "eid": "div-gpt-ad-1", "w": 728, "h": 90})
# True

css("www.washingtonpost.com")
# '[class*="ad-slot"],...{display:none !important;height:0 !important;...}'

eraser = AdEraser("example.com")
eraser.detect([{"id": 0, "cls": "advert", "w": 300, "h": 250, "iframe": True},
               {"id": 1, "cls": "article-body", "w": 680, "h": 1200}])
# [0]

How it works

A browser (e.g. Melon) collects candidate page elements and, on first visit to a site, asks an LLM which are ads.
Each verdict is appended to data/ad_dataset.jsonl as labeled training data and pushed here.
scripts/build_rules.py re-derives high-precision class/id token rules + per-domain CSS selectors into magic_eraser/rules.json.
magic-eraser then blocks ads with zero LLM calls — and gets better every time the dataset grows.

The dataset

data/ad_dataset.jsonl — one JSON object per labeled page element:

field	meaning
`host`, `url`, `ts`	where/when it was seen
`tag`	element tag (DIV, IFRAME, …)
`cls`, `eid`	class string, element id
`w`, `h`	rendered size (px)
`iframe`	is it an iframe
`txt`	short visible-text snippet
`is_ad`	label — ad/clutter (true) or content (false)

Load it with HuggingFace datasets:

from datasets import load_dataset

ds = load_dataset(
    "json",
    data_files="https://raw.githubusercontent.com/alvations/magic-eraser/main/data/ad_dataset.jsonl",
    split="train",
)
ds[0]  # {'host': ..., 'cls': ..., 'is_ad': True, ...}

Train a model to replace the rules

pip install "magic-eraser[train]"
python scripts/build_rules.py     # regenerate rule-based detector from data

The labeled dataset is designed to train a small local classifier (features → is_ad) that can replace both the rules and the LLM entirely.

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magic_eraser-0.1.0.tar.gz (8.5 kB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

magic_eraser-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file magic_eraser-0.1.0.tar.gz.

File metadata

Download URL: magic_eraser-0.1.0.tar.gz
Upload date: Jun 13, 2026
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for magic_eraser-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a115663688ec112922ea08387f2e080021ddf507ae9611f13e6eb25ccff776d0`
MD5	`f495a2561bc639350d2ddf2900ee11ae`
BLAKE2b-256	`39a88e5cd719373e181c2dac8b5b0d358925b0316fba5a1aae73c89c41d8c2a7`

See more details on using hashes here.

File details

Details for the file magic_eraser-0.1.0-py3-none-any.whl.

File metadata

Download URL: magic_eraser-0.1.0-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for magic_eraser-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a8a057592d732f7ed917426d0ed4d278d530abb80a9c4a902f8fa4398a05694f`
MD5	`c8dd030ce02f5528e71b3f2868d69f77`
BLAKE2b-256	`103102625448f9f50e6af23dd06cfa1f4d81c54b844b4a0ae5829c0148447917`

See more details on using hashes here.

magic-eraser 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🪄 magic-eraser

How it works

The dataset

Train a model to replace the rules

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes