Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Project description
extr-ds
Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Extension of the extr library.
Install
pip install extr-ds
Example
text = 'Ted Johnson is a pitcher. Ted went to my school.'
1. Label Entities for Named-Entity Recognition Task (NER)
from extr import RegEx, RegExLabel, EntityExtactor
from extr-ds import IOB
entity_extractor = EntityExtactor([
RegExLabel('PERSON', [
RegEx([r'(ted\s+johnson|ted)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
sentence_tokenizer = ## 3rd party tokenizer ##
labels = IOB(sentence_tokenizer, entity_extractor).label(text)
## labels == [
## ['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
## ['B-PERSON', 'O', 'O', 'O', 'O', 'O']
## ]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
extr-ds-0.0.1.tar.gz
(2.8 kB
view hashes)