State-of-the-art Information Extraction in PyTorch
Project description
🚀️ Quickstart
$ pip install pytorch-ie
⚡️ Examples
Span-classification-based Named Entity Recognition
from pytorch_ie.taskmodules import TransformerSpanClassificationTaskModule
from pytorch_ie.models import TransformerSpanClassificationModel
from pytorch_ie import Pipeline, Document
model_name_or_path = "pie/example-ner-spanclf-conll03"
ner_taskmodule = TransformerSpanClassificationTaskModule.from_pretrained(model_name_or_path)
ner_model = TransformerSpanClassificationModel.from_pretrained(model_name_or_path)
ner_pipeline = Pipeline(model=ner_model, taskmodule=ner_taskmodule, device=-1)
document = Document("“Making a super tasty alt-chicken wing is only half of it,” said Po Bronson, general partner at SOSV and managing director of IndieBio.")
ner_pipeline(document, predict_field="entities")
for entity in document.predictions("entities"):
entity_text = document.text[entity.start: entity.end]
label = entity.label
print(f"{entity_text} -> {label}")
# Result:
# IndieBio -> ORG
# Po Bronson -> PER
# SOSV -> ORG
Text-classification-based Relation Extraction
from pytorch_ie.taskmodules import TransformerRETextClassificationTaskModule
from pytorch_ie.models import TransformerTextClassificationModel
from pytorch_ie import Pipeline
from pytorch_ie.data import Document, LabeledSpan
model_name_or_path = "pie/example-re-textclf-tacred"
re_taskmodule = TransformerRETextClassificationTaskModule.from_pretrained(model_name_or_path)
re_model = TransformerTextClassificationModel.from_pretrained(model_name_or_path)
re_pipeline = Pipeline(model=re_model, taskmodule=re_taskmodule, device=-1)
document = Document("“Making a super tasty alt-chicken wing is only half of it,” said Po Bronson, general partner at SOSV and managing director of IndieBio.")
for start, end, label in [(65, 75, "PER"), (96, 100, "ORG"), (126, 134, "ORG")]:
document.add_annotation("entities", LabeledSpan(start, end, label))
re_pipeline(document, predict_field="relations")
for relation in document.predictions("relations"):
head, tail = relation.head, relation.tail
head_text = document.text[head.start: head.end]
tail_text = document.text[tail.start: tail.end]
label = relation.label
print(f"({head_text} -> {tail_text}) -> {label}")
# Result:
# (Po Bronson -> SOSV) -> per:employee_of
# (Po Bronson -> IndieBio) -> per:employee_of
# (SOSV -> Po Bronson) -> org:top_members/employees
# (IndieBio -> Po Bronson) -> org:top_members/employees
Development Setup
🏅 Acknowledgements
This package is based on the sourcery-ai/python-best-practices-cookiecutter and cjolowicz/cookiecutter-hypermodern-python project templates.
📃 Citation
If you want to cite the framework feel free to use this:
@misc{alt2022pytorchie,
author={Christoph Alt, Arne Binder},
title = {PyTorch-IE: State-of-the-art Information Extraction in PyTorch},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ChristophAlt/pytorch-ie}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytorch-ie-0.2.1.tar.gz
(69.4 kB
view hashes)
Built Distribution
pytorch_ie-0.2.1-py3-none-any.whl
(100.5 kB
view hashes)
Close
Hashes for pytorch_ie-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f13ec3307fb2811cd4d3e40f8c6326188b5e823d3e7e40ba04219ab1f62d7bc7 |
|
MD5 | 0a7aad04f87543994a34a4051fefd3c4 |
|
BLAKE2b-256 | ca943524745c76b48bdb4353520d2e36ac8ae8218f791d88f6f3f214afcd7eb1 |