Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions
Project description
Extr
Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions
Install
pip install extr
Example
text = 'Ted is a Pitcher.'
1. Entity Extraction
Find Named Entities from text.
from extr import RegEx, RegExLabel
from extr.entities import EntityExtractor
entity_extractor = EntityExtractor([
RegExLabel('PERSON', [
RegEx([r'ted'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
entities = entity_extractor.get_entities(text)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
or add a knowledge base
from extr import RegEx, RegExLabel
from extr.entities import create_entity_extractor
entity_extractor = create_entity_extractor(
[
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
],
kb={
'PERSON': ['Ted']
}
)
entities = entity_extractor.get_entities(text)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
2. Visualize Entities in HTML
Annotate text to display in HTML.
from extr.entities.viewers import HtmlViewer
viewer = HtmlViewer()
viewer.append(text, entities)
html = viewer.create_view(custom_styles="""
.lb-PERSON {
background-color: orange;
}
.lb-POSITION {
background-color: yellow;
}
""")
3. Relation Extraction
Annotate and Extract Relationships between Entities
from extr.entities import EntityAnnotator
from extr.relations import RelationExtractor, \
RegExRelationLabelBuilder
## define relationship between PERSON and POSITION
relationship = RegExRelationLabelBuilder('is_a') \
.add_e1_to_e2(
'PERSON', ## e1
[
## define how the relationship exists in nature
r'\s+is\s+a\s+',
],
'POSITION' ## e2
) \
.build()
relations_to_extract = [relationship]
## `entities` see 'Entity Extraction' above
annotation_results = EntityAnnotator().annotate(text, entities)
relations = RelationExtractor(relations_to_extract).extract(annotation_results)
## relations == [
## <Relation e1="Ted" r="is_a" e2="Pitcher">
## ]
4. Apply Attributes to Entities
mark entities based on their surroundings
from extr.entities import EntityAttributor, \
AttributeToApply, \
AttributeApplications, \
AttributeSetup
negative_text = 'Ted is not a Pitcher.'
entities = extractor.get_entities(negative_text)
attributor = EntityAttributor(
attribute_name='ctypes',
settings = [
AttributeToApply(
'NEGATIVE',
applications=[
AttributeApplications(
entities=['POSITION'],
setups=[
AttributeSetup(before=r' not a ')
]
)
]
)
]
)
entities = attributor.set_attributes(negative_text, entities)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(13, 20) attributes={"ctypes": ["NEGATIVE"]}>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
extr-0.0.31.tar.gz
(9.9 kB
view hashes)
Built Distribution
extr-0.0.31-py3-none-any.whl
(13.4 kB
view hashes)