Skip to main content

Lightweight cross-lingual coreference resolution with spaCy using ONNX Runtime inference of transformer models.

Project description

spacy-coref

Lightweight, fast co-reference resolution using a distilled version of AllenNLP's coreference model (exported to ONNX).

✨ Features

  • 🧠 Cross-lingual coreference resolution
  • 🪶 Lightweight model based on AllenNLP’s coref modeling
  • ⚡ Fast inference via ONNX
  • 🔌 Easy integration with spaCy

📦 Installation

$ pip install spacy-coref

🚀 Quickstart

Usage as a standalone component

from spacy_coref import CoreferenceResolver, decode_clusters

resolver = CoreferenceResolver.from_pretrained("talmago/allennlp-coref-onnx-mMiniLMv2-L12-H384-distilled-from-XLMR-Large")

sentences = [
    ["Barack", "Obama", "was", "the", "44th", "President", "of", "the", "United", "States", "."],
    ["He", "was", "born", "in", "Hawaii", "."]
]

pred = resolver(sentences)

print(decode_clusters(sentences, pred["clusters"][0]))

# Output is:
# [['Barack Obama', 'He']]

Usage with spaCy

import spacy

from spacy_coref import create_coref_minilm_component

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("coref_minilm")

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
print(doc._.coref_clusters[0])
print(doc._.cluster_heads)
print(doc._.resolved_text)

# Output is:
# [Barack Obama, He]
# {'Barack Obama': Barack Obama}
# Barack Obama was born in Hawaii. Barack Obama was elected president in 2008.

🛠️ Development

Set up virtualenv

$ make env

Set PYTHONPATH

$ export PYTHONPATH=$PYTHONPATH:/Users/talmago/git/spacy-coref/src

Code formatting

$ make format

📚 References

This project builds on the work of the following repositories:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_coref-0.1.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_coref-0.1.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file spacy_coref-0.1.1.tar.gz.

File metadata

  • Download URL: spacy_coref-0.1.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for spacy_coref-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c341874afcee1da057fc0c5c02045168d351506c2ee85246a769e247f5462014
MD5 2ee110de2bd1859cda2364ab7ca75bab
BLAKE2b-256 772f5268afd914f10e95c7f58c07f9b63d7f333e885a4c20cca0e7f1a0e3b235

See more details on using hashes here.

File details

Details for the file spacy_coref-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: spacy_coref-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for spacy_coref-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3aed60ac4416e25f562b4717981f106eb79b63e76478df69b4c09e125ea17fa
MD5 069f2667845fd0d9bf130efc5c39494a
BLAKE2b-256 070583d719559b4ca6f143665603d6260a7813bda749229153c45a1449b0df7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page