Torch-only inference for grounded infon extraction: paragraph in -> typed, grounded, polarity-aware infons + coreference, multilingual (EN/JA/KO), on a fine-tuned mDeBERTa backbone. No transformers runtime dependency.

These details have not been verified by PyPI

Project links

Homepage

Project description

hyper-gliner

Torch-only inference for grounded infon extraction + learned-sparse recall, multilingual (EN/JA/KO), on fine-tuned mDeBERTa + a BERT SPLADE retriever. No transformers runtime dependency — both encoders are reimplemented and parity-proven, so the package is small enough to ship in an AWS Lambda container image.

from hyper_gliner import InfonExtractor, SpladeRetriever

ex = InfonExtractor.from_pretrained("cp500/infon-extract")        # extraction + polarity + coref
infons = ex.extract("Samsung SDI supplies battery cells to BMW.")  # -> [Infon(...)]
#  Infon(Samsung SDI —supplies→ battery cells)  polarity=affirmed

# the {"anchors","infons","stats"} envelope the downstream pipeline consumes:
doc = ex.extract_doc("...", title="...")        # drop-in for extract_infons_neural

sp = SpladeRetriever.from_pretrained("cp500/infon-extract")        # loads the sparse/ subfolder
terms = sp.encode("electric SUV battery", mode="query")            # {term_id: weight}
doc = ex.extract_doc("...", sparse=sp)          # attaches per-infon splade_terms (recall tags)

What's in the box

Capability	Model	Role
Extraction	mDeBERTa-v3 (fine-tuned) + span_rep + classifier + count	explicit grounded infons (subject→predicate→object + polarity)
Polarity	3-way head (affirmed / negated / uncertain)	constitutive negation
Coreference	Lee-2017 antecedent head	resolve anaphora (rewrite-first)
Sparse recall	`cp500/opensearch-neural-sparse-en-jp-ko` (BERT SPLADE)	implicit term-level recall (complements extraction)

All weights ship as fp16 in cp500/infon-extract (the sparse model under sparse/).

Dependencies

torch, tokenizers, safetensors, numpy. Optional huggingface_hub (only to download from the Hub; pass a local dir otherwise — e.g. weights baked into a Lambda image).

Design notes

Torch-only, parity-proven. The mDeBERTa encoder (disentangled attention) and the BERT SPLADE encoder are reimplemented in pure torch and gated bit-close against the reference implementation (tests/test_parity.py 9.5e-5, tests/test_bert_parity.py 1e-5). The full decode is gated against the reference implementation (tests/test_decode_parity.py, exact on EN/JA/KO).
CJK grounding. A subword splitter + char-offset grounding make Japanese/Korean ground at 100% (the base span-extraction architecture whitespace-splits and fails on spaceless CJK).
Precision vs recall, separated. extract() applies an exact-span argmax filter (one canonical type per span, nesting preserved). The SPLADE head supplies the multi-label / implicit recall channel as splade_terms — feeding an inverted index, a SQL splade_terms side-table, and the term↔infon bipartite incidence a Sheaf GNN expands.
Triple-level grounding filter (on by default). extract() / extract_doc() run a measured precision filter (decode_filter.filter_infons): it drops the predicates an adversarial grounding audit found ~95–100% non-grounded (supplies, partner, competes_with — the relation head fires them from the document theme, not the connecting clause), plus self-loops and non-entity (quantity/currency/spec) objects. On the audited set this lifts grounded precision 29.5%→58% and cuts hallucination 16.7%→6% while retaining 93.5% of truly-grounded infons. Pass filter_triples=False for raw, unfiltered output.
fp16 only. int8-dynamic was measured to hurt the extraction model (grounding 14→5 spans, and larger on disk because the embedding table stays fp32); fp16 is the shipped artifact.

Consumer contract

extract_doc() returns the same {"anchors","infons","stats"} envelope, with each infon dict carrying subject / predicate / object / polarity(int) / mass{supports,refutes,theta} / span / confidence / relation_type / tense / anchors, so it is a drop-in for the downstream ingest pipeline + Memgraph writer + MCTS/IKL readers. See tests/test_consumer_contract.py.

License

Apache-2.0.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.3

Jun 17, 2026

0.1.2 yanked

Jun 16, 2026

0.1.1 yanked

Jun 16, 2026

0.1.0 yanked

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyper_gliner-0.1.3.tar.gz (35.1 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyper_gliner-0.1.3-py3-none-any.whl (38.5 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file hyper_gliner-0.1.3.tar.gz.

File metadata

Download URL: hyper_gliner-0.1.3.tar.gz
Upload date: Jun 17, 2026
Size: 35.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hyper_gliner-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`07891db3aac679c7d4b8931850cf83f4b15c2ec7479a8206f4c0d995bdb91413`
MD5	`6c850119c41e8defdc55d69c27edde13`
BLAKE2b-256	`b7f3d4e3385221db0abc2e5d5fc596d75450a20e01ce3c6c9ae176d7f7846514`

See more details on using hashes here.

File details

Details for the file hyper_gliner-0.1.3-py3-none-any.whl.

File metadata

Download URL: hyper_gliner-0.1.3-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 38.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hyper_gliner-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`028fbb835df02c61089680ca7fcbb0f0b1c0c9bc1c3a547665e53e2a1a853dbd`
MD5	`d9756014423889a104c3b88e71be393b`
BLAKE2b-256	`f987061b22a5fb84256ffff313134dd5ed7a883e933cac1e64930bcbf386f796`

See more details on using hashes here.

hyper-gliner 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

hyper-gliner

What's in the box

Dependencies

Design notes

Consumer contract

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes