Cross-lingual sparse retrieval + reasoning + GA imagination over typed concept anchors (EN/JA/KO).

These details have not been verified by PyPI

Project links

Project description

BlackMagic

Cross-lingual sparse retrieval + reasoning + GA imagination over typed concept anchors. Default encoder is a multilingual SPLADE fine-tune (cp500/opensearch-neural-sparse-en-jp-ko) so an English-authored schema retrieves across EN / JA / KO.

Install

pip install blackmagic-retrieval

The multilingual SPLADE model (~670MB) is downloaded from HuggingFace on first use and cached by the transformers library. For development:

git clone <repo>
pip install -e '.[dev]'
PYTHONPATH=src pytest tests/

Quickstart

from blackmagic import BlackMagic, BlackMagicConfig

bm = BlackMagic(BlackMagicConfig(
    schema_path="examples/automotive_schema.json",
    db_path=":memory:",
))

bm.ingest([
    {"text": "Toyota announced a $13.6B investment in battery production.",
     "id": "d1", "timestamp": "2026-03-01"},
    {"text": "Honda launches new EV in partnership with CATL.",
     "id": "d2", "timestamp": "2026-03-15"},
])

# Sparse retrieval with persona valence
result = bm.search("automakers investing in batteries", persona="investor")
for inf in result.infons[:5]:
    print(inf.subject, inf.predicate, inf.object, inf.confidence)

# Dempster-Shafer claim verification
v = bm.verify_claim("Toyota is aggressively investing in batteries.")
print(v.label, v.belief_supports, v.belief_refutes)

# MCTS multi-hop reasoning
m = bm.reason("Does the industry face supply risks?")
print(m.verdict, m.chains_discovered)

# GA imagination — MCTS-shaped output with dual verdicts
im = bm.imagine("What OEM–supplier partnerships might emerge?")
print(im.verdict, im.mcts_verdict)
for inf in im.imagined_infons[:5]:
    print(inf.subject, inf.predicate, inf.object,
          "fitness=", inf.fitness,
          "parents=", inf.parent_infon_ids)

Features

Sparse retrieval via splade-tiny → typed anchor projection
Persona valence — investor / engineer / executive / regulator / analyst
Contrary views — invert the evidential lens at query time
Temporal graph — NEXT edges link facts across time per shared anchor
Constraint aggregation — cross-document infon fusion
Dempster-Shafer claim verification
Graph MCTS for multi-hop reasoning
GA imagination (new) — query-scoped genetic algorithm that proposes plausible counterfactual infons scored by grammar × logic × health, with output isomorphic to MCTSResult

When to use cognition vs BlackMagic

	cognition	BlackMagic
Languages	EN + JA/KO/ZH/...	EN default, EN/JA/KO via multilingual flag
Encoder	splade-tiny or multilingual XLM-R	splade-tiny bundled; any HF SPLADE via config
Structural analysis (Kano, Kan, etc.)	Yes	No
Category theory extensions	Yes	No
Cloud backend (DynamoDB, Lambda)	Yes	No
MCP / agent tooling	Yes	No
GA imagination	No	Yes
Line count	~5,700	~3,900

Multilingual (EN / JA / KO)

Set multilingual=True to use the fine-tuned cp500/opensearch-neural-sparse-en-jp-ko model. A Japanese or Korean sentence activates the same English anchor positions as its English parallel — you write your schema once, in English, and ingestion / search / verify / imagine all work across all three languages.

cfg = BlackMagicConfig(
    schema_path="examples/automotive_schema.json",
    model_name="cp500/opensearch-neural-sparse-en-jp-ko",
    multilingual=True,         # self-encode anchors + exclusivity filter
    activation_threshold=0.25, # looser than splade-tiny's 0.3
    min_confidence=0.15,
)
bm = BlackMagic(cfg)

bm.ingest([
    {"id": "en1", "text": "Chevron announced a $15B investment in the Permian Basin.",
     "timestamp": "2026-04-01"},
    {"id": "ja1", "text": "シェブロンはパーミアン盆地で150億ドルの投資を発表した。",
     "timestamp": "2026-04-01"},
    {"id": "ko1", "text": "셰브런은 퍼미안 분지에 150억 달러 규모의 투자를 발표했다.",
     "timestamp": "2026-04-01"},
])
# All three docs produce infons; a JA query can retrieve KO evidence and vice versa.

How it works. The multilingual model doesn't fire on literal English token IDs; it expands every anchor string (including JA/KO parallels) into the same multilingual subword soup (Latin + CJK + Cyrillic + Arabic). On init, BlackMagic self-encodes each anchor's surface forms through SPLADE, keeps its top-K expansion positions, and subtracts positions that also activate for another same-type anchor (crosstalk filter).

Benchmark. On 200 held-out concepts × 9 language pairs (1,800 query×passage pairs) from cp500/multilingual-concept-training-kit:

	MRR@10	Recall@10
en→en	1.000	1.000
en→ja	0.995	1.000
ja→en	0.998	1.000
ko→en	0.995	1.000
ko→ko	0.998	1.000
OVERALL	0.996	1.000

EN-vocab ratio on top-50 dims: en 0.57, ja 0.55, ko 0.55 — JA/KO queries project into English-like vocab positions at ~97% of English's rate.

Testing

# splade-tiny only (fast, CI-friendly)
PYTHONPATH=src pytest tests/

# including multilingual integration (downloads ~600MB model)
PYTHONPATH=src RUN_ML_TESTS=1 pytest tests/

# kit-scale retrieval benchmark
PYTHONPATH=src python examples/benchmark_multilingual.py

License

Apache 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blackmagic_retrieval-0.1.0.tar.gz (77.8 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

blackmagic_retrieval-0.1.0-py3-none-any.whl (63.1 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file blackmagic_retrieval-0.1.0.tar.gz.

File metadata

Download URL: blackmagic_retrieval-0.1.0.tar.gz
Upload date: Apr 28, 2026
Size: 77.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for blackmagic_retrieval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`249da3ab34fb9f4e195d20d0c308a51308bc22bf9b6e89111711a677b027ddc9`
MD5	`01f8e5286bad37896995ec2b23167474`
BLAKE2b-256	`fe1e8d65bc9a40958573decc97fd41c4091d1a7b45fc7bad12f2779d2629e3de`

See more details on using hashes here.

File details

Details for the file blackmagic_retrieval-0.1.0-py3-none-any.whl.

File metadata

Download URL: blackmagic_retrieval-0.1.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 63.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for blackmagic_retrieval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6cd5a81440391c90a37e4bb7ac6edb8aa7ba6016c7252f777e8f7d9d8cf0f25b`
MD5	`58c677e7cc9815779fd83c89b605d31a`
BLAKE2b-256	`18a5e3536cf63b26ed26dce203ea0581df5a1418ef093cf50040d3191a88ce4c`

See more details on using hashes here.

blackmagic-retrieval 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BlackMagic

Install

Quickstart

Features

When to use cognition vs BlackMagic

Multilingual (EN / JA / KO)

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes