State-of-the-art Information Extraction in PyTorch
Project description
🚀️ Quickstart
$ pip install pytorch-ie
⚡️ Examples
Span-classification-based Named Entity Recognition
from pytorch_ie.taskmodules import TransformerSpanClassificationTaskModule
from pytorch_ie.models import TransformerSpanClassificationModel
from pytorch_ie import Pipeline, Document
model_name_or_path = "pie/example-ner-spanclf-conll03"
ner_taskmodule = TransformerSpanClassificationTaskModule.from_pretrained(model_name_or_path)
ner_model = TransformerSpanClassificationModel.from_pretrained(model_name_or_path)
ner_pipeline = Pipeline(model=ner_model, taskmodule=ner_taskmodule, device=-1)
document = Document("“Making a super tasty alt-chicken wing is only half of it,” said Po Bronson, general partner at SOSV and managing director of IndieBio.")
ner_pipeline(document, predict_field="entities")
for entity in document.predictions.spans["entities"]:
entity_text = document.text[entity.start : entity.end]
label = entity.label
print(f"{entity_text} -> {label}")
# Result:
# IndieBio -> ORG
# Po Bronson -> PER
# SOSV -> ORG
Text-classification-based Relation Extraction
from pytorch_ie.taskmodules import TransformerRETextClassificationTaskModule
from pytorch_ie.models import TransformerTextClassificationModel
from pytorch_ie import Pipeline
from pytorch_ie.data import Document, LabeledSpan
model_name_or_path = "pie/example-re-textclf-tacred"
re_taskmodule = TransformerRETextClassificationTaskModule.from_pretrained(model_name_or_path)
re_model = TransformerTextClassificationModel.from_pretrained(model_name_or_path)
re_pipeline = Pipeline(model=re_model, taskmodule=re_taskmodule, device=-1)
document = Document("“Making a super tasty alt-chicken wing is only half of it,” said Po Bronson, general partner at SOSV and managing director of IndieBio.")
for start, end, label in [(65, 75, "PER"), (96, 100, "ORG"), (126, 134, "ORG")]:
document.add_annotation("entities", LabeledSpan(start, end, label))
re_pipeline(document, predict_field="relations")
for relation in document.predictions.binary_relations["relations"]:
head, tail = relation.head, relation.tail
head_text = document.text[head.start : head.end]
tail_text = document.text[tail.start : tail.end]
label = relation.label
print(f"({head_text} -> {tail_text}) -> {label}")
# Result:
# (Po Bronson -> SOSV) -> per:employee_of
# (Po Bronson -> IndieBio) -> per:employee_of
# (SOSV -> Po Bronson) -> org:top_members/employees
# (IndieBio -> Po Bronson) -> org:top_members/employees
Development Setup
🏅 Acknowledgements
This package is based on the sourcery-ai/python-best-practices-cookiecutter and cjolowicz/cookiecutter-hypermodern-python project templates.
📃 Citation
If you want to cite the framework feel free to use this:
@misc{alt2022pytorchie,
author={Christoph Alt, Arne Binder},
title = {PyTorch-IE: State-of-the-art Information Extraction in PyTorch},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ChristophAlt/pytorch-ie}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytorch-ie-0.3.0.tar.gz
(70.4 kB
view hashes)
Built Distribution
pytorch_ie-0.3.0-py3-none-any.whl
(101.5 kB
view hashes)
Close
Hashes for pytorch_ie-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1142d2d09e355e89e050e2c67d0d29f2b388be86f71df3e301f12c0c7ec4bd70 |
|
MD5 | 1abeadd5784386f960166481a4dbb68a |
|
BLAKE2b-256 | 253f01fdb5b9070b9b6a4e9644d252e37eb02a7ac348189f7c225d94a7ad86cb |