Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Project description
extr-ds
Library to programmatically build labeled datasets for Named-Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Install
pip install extr-ds
Command Line
see Instructions on how to use the command line utility to manage your project.
1. Init Project
extr-ds --init
2. Split and Annotate
extr-ds --split
3.a Annotate Entities or Relations Again?
extr-ds --annotate -ents
extr-ds --annotate -rels
3.b Change Relation Extraction Label
extr-ds --relate -label NO_RELATION=5,7,9
3.b Remove Relation Extraction Instance
extr-ds --relate -delete 5,6,7
3.c Recover removed Relation Extraction Instances
extr-ds --relate -recover 5,6,7
4. Save
extr-ds --save -ents
extr-ds --save -rels
5. Reset "Gold Standard" datasets
extr-ds --reset
6. Help!?
extr-ds --help
API
Example
text = 'Ted Johnson is a pitcher.'
1. Label Entities for Named-Entity Recognition Task (NER)
from extr import RegEx, RegExLabel
from extr.entities import EntityExtactor
from extr_ds.labelers import IOB
entity_extractor = EntityExtactor([
RegExLabel('PERSON', [
RegEx([r'(ted\s+johnson|ted)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
sentence_tokenizer = ## 3rd party tokenizer ##
label = IOB(sentence_tokenizer, entity_extractor).label(text)
## label == <Label tokens=..., labels=['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O']>
2. Annotate for Relation Extraction Task (RE)
from extr.entities import EntityExtractor
from extr.relations import RegExRelationLabelBuilder, \
RelationExtractor
from extr_ds.labelers import RelationClassification
from extr_ds.labelers.relation import RelationBuilder, BaseRelationLabeler, RuleBasedRelationLabeler
person_to_position_relationship = RegExRelationLabelBuilder('is_a') \
.add_e1_to_e2(
'PERSON',
[
r'\s+is\s+a\s+',
],
'POSITION'
) \
.build()
base_relation_labeler = BaseRelationLabeler(
RelationBuilder(relation_formats=[
('PERSON', 'POSITION', 'NO_RELATION')
])
)
rule_based_relation_labeler = RuleBasedRelationLabeler(
RelationExtractor([person_to_position_relationship])
)
labeler = RelationClassification(
EntityExtractor([
RegExLabel('PERSON', [
RegEx([r'(ted johnson|bob)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
]),
base_relation_labeler,
relation_labelers=[
rule_based_relation_labeler
]
)
results = labeler.label(text)
## results.relation_labels == [
## <RelationLabel sentence="<e1>Ted Johnson</e1> is a <e2>pitcher</e2>." label="is_a">
## ]
3. Find and define the type of difference between labels
from extr_ds.validators import check_for_differences
differences_in_labels = check_for_differences(
['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
['B-PERSON', 'O', 'O', 'O', 'B-POSITION', 'O']
)
## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels == [
## <Difference index=1, diff_type=DifferenceTypes.S2_MISSING>
## ]
differences_in_labels = check_for_differences(
['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
['B-PERSON', 'B-PERSON', 'O', 'O', 'B-POSITION', 'O']
)
## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels == [
## <Difference index=1, diff_type=DifferenceTypes.MISMATCH>
## ]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file extr-ds-0.0.86.tar.gz.
File metadata
- Download URL: extr-ds-0.0.86.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fc0b0dafccc1d34792f53ff0fdd651285e3d3817979217b0247722e9313bf37
|
|
| MD5 |
3c4e7c4afd3bf2640813aa66598f5477
|
|
| BLAKE2b-256 |
3667c2b7862b04645cd562b5e36cc1c659742e32f68ca48d6f08ac22dd1e6961
|
File details
Details for the file extr_ds-0.0.86-py3-none-any.whl.
File metadata
- Download URL: extr_ds-0.0.86-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d6ffdb11bc8e019a382ecb5bbf18e7b557e186ca443c313ca59c5befb214221
|
|
| MD5 |
27daf69bbcc9a7056ca489e9fbe866fc
|
|
| BLAKE2b-256 |
6687cc19d9abfffc3f85da513a68ddcf875dbd3250dcc23278e8f6a57a43b7c8
|