State-of-the-art Information Extraction in PyTorch
Project description
🚀️ Quickstart
$ pip install pytorch-ie
⚡️ Examples
Note: Setting num_workers=0 in the pipeline is only necessary when running an example in an interactive python session. The reason is that multiprocessing doesn’t play well with the interactive python interpreter, see here for details.
Span-classification-based Named Entity Recognition
from dataclasses import dataclass
from pytorch_ie import AnnotationList, LabeledSpan, Pipeline, TextDocument, annotation_field
from pytorch_ie.models import TransformerSpanClassificationModel
from pytorch_ie.taskmodules import TransformerSpanClassificationTaskModule
@dataclass
class ExampleDocument(TextDocument):
entities: AnnotationList[LabeledSpan] = annotation_field(target="text")
model_name_or_path = "pie/example-ner-spanclf-conll03"
ner_taskmodule = TransformerSpanClassificationTaskModule.from_pretrained(model_name_or_path)
ner_model = TransformerSpanClassificationModel.from_pretrained(model_name_or_path)
ner_pipeline = Pipeline(model=ner_model, taskmodule=ner_taskmodule, device=-1, num_workers=0)
document = ExampleDocument(
"“Making a super tasty alt-chicken wing is only half of it,” said Po Bronson, general partner at SOSV and managing director of IndieBio."
)
ner_pipeline(document, predict_field="entities")
for entity in document.entities.predictions:
print(f"{entity} -> {entity.label}")
# Result:
# IndieBio -> ORG
# Po Bronson -> PER
# SOSV -> ORG
Text-classification-based Relation Extraction
from dataclasses import dataclass
from pytorch_ie import AnnotationList, BinaryRelation, LabeledSpan, Pipeline, TextDocument, annotation_field
from pytorch_ie.models import TransformerTextClassificationModel
from pytorch_ie.taskmodules import TransformerRETextClassificationTaskModule
@dataclass
class ExampleDocument(TextDocument):
entities: AnnotationList[LabeledSpan] = annotation_field(target="text")
relations: AnnotationList[BinaryRelation] = annotation_field(target="entities")
model_name_or_path = "pie/example-re-textclf-tacred"
re_taskmodule = TransformerRETextClassificationTaskModule.from_pretrained(model_name_or_path)
re_model = TransformerTextClassificationModel.from_pretrained(model_name_or_path)
re_pipeline = Pipeline(model=re_model, taskmodule=re_taskmodule, device=-1, num_workers=0)
document = ExampleDocument(
"“Making a super tasty alt-chicken wing is only half of it,” said Po Bronson, general partner at SOSV and managing director of IndieBio."
)
for start, end, label in [(65, 75, "PER"), (96, 100, "ORG"), (126, 134, "ORG")]:
document.entities.append(LabeledSpan(start=start, end=end, label=label))
re_pipeline(document, predict_field="relations", batch_size=2)
for relation in document.relations.predictions:
print(f"({relation.head} -> {relation.tail}) -> {relation.label}")
# Result:
# (Po Bronson -> SOSV) -> per:employee_of
# (Po Bronson -> IndieBio) -> per:employee_of
# (SOSV -> Po Bronson) -> org:top_members/employees
# (IndieBio -> Po Bronson) -> org:top_members/employees
Development Setup
🏅 Acknowledgements
This package is based on the sourcery-ai/python-best-practices-cookiecutter and cjolowicz/cookiecutter-hypermodern-python project templates.
📃 Citation
If you want to cite the framework feel free to use this:
@misc{alt2022pytorchie,
author={Christoph Alt, Arne Binder},
title = {PyTorch-IE: State-of-the-art Information Extraction in PyTorch},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ChristophAlt/pytorch-ie}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytorch-ie-0.4.0.tar.gz
(71.0 kB
view hashes)
Built Distribution
pytorch_ie-0.4.0-py3-none-any.whl
(98.9 kB
view hashes)
Close
Hashes for pytorch_ie-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 741160098d3750212091e26ff239bce98b326c680319180d50a490102b52567b |
|
MD5 | 3cf1a682ea41f7be96f6c35bcfcda6e5 |
|
BLAKE2b-256 | 2f5eaf1efb69310b5dc33d15e6b38767103333dbf1a884fe2f4e273307d1bfff |