Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions
Project description
Extr
Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions
Install
pip install extr
Example
text = 'Ted is a Pitcher.'
1. Entity Extraction
Find Named Entities from text.
from extr import RegEx, RegExLabel
from extr.entities import EntityExtractor
entity_extractor = EntityExtractor([
RegExLabel('PERSON', [
RegEx([r'ted'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
entities = entity_extractor.get_entities(text)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
or add a knowledge base
from extr import RegEx, RegExLabel
from extr.entities import create_entity_extractor
entity_extractor = create_entity_extractor(
[
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
],
kb={
'PERSON': ['Ted']
}
)
entities = entity_extractor.get_entities(text)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
2. Visualize Entities in HTML
Annotate text to display in HTML.
from extr.entities.viewers import HtmlViewer
viewer = HtmlViewer()
viewer.append(text, entities)
html = viewer.create_view(custom_styles="""
.lb-PERSON {
background-color: orange;
}
.lb-POSITION {
background-color: yellow;
}
""")
3. Relation Extraction
Annotate and Extract Relationships between Entities
from extr.entities import EntityAnnotator
from extr.relations import RelationExtractor, \
RegExRelationLabelBuilder
## define relationship between PERSON and POSITION
relationship = RegExRelationLabelBuilder('is_a') \
.add_e1_to_e2(
'PERSON', ## e1
[
## define how the relationship exists in nature
r'\s+is\s+a\s+',
],
'POSITION' ## e2
) \
.build()
relations_to_extract = [relationship]
## `entities` see 'Entity Extraction' above
annotated_text = EntityAnnotator().annotate(text, entities)
relations = RelationExtractor(relations_to_extract).extract(annotated_text, entities)
## relations == [
## <Relation e1="Ted" r="is_a" e2="Pitcher">
## ]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
extr-0.0.44.tar.gz
(11.6 kB
view details)
Built Distribution
extr-0.0.44-py3-none-any.whl
(16.4 kB
view details)
File details
Details for the file extr-0.0.44.tar.gz
.
File metadata
- Download URL: extr-0.0.44.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08330bf28c496b5743c5a51c04f1b4f8de1d6b87cabdfbb794339b6c2fc07673 |
|
MD5 | 34b979b226d50bc646d30e8fbbfa17bc |
|
BLAKE2b-256 | ae555303faafa47b5ad9c3242c723690cfad3cd387cf0539d068c9e0afa55d88 |
File details
Details for the file extr-0.0.44-py3-none-any.whl
.
File metadata
- Download URL: extr-0.0.44-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93f1ce73482208fc0beab2f79b8c8e997f0c2261eb7b5f903f0e3ba379a74842 |
|
MD5 | 9c67c27be01a04b9b237f1b025b5c3c8 |
|
BLAKE2b-256 | a76b5c9c0281700418196fbd93ac38e033c4a347cfe37a7869c9031c32c2b535 |