Skip to main content

Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions

Project description

Extr

Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions


Install

pip install extr

Example

text = 'Ted is a Pitcher.'

1. Entity Extraction

Find Named Entities from text.

from extr import RegEx, RegExLabel
from extr.entities import EntityExtractor

entity_extractor = EntityExtractor([
    RegExLabel('PERSON', [
        RegEx([r'ted'], re.IGNORECASE)
    ]),
    RegExLabel('POSITION', [
        RegEx([r'pitcher'], re.IGNORECASE)
    ]),
])

entities = entity_extractor.get_entities(text)

## entities == [
##      <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
##      <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]

or add a knowledge base

from extr import RegEx, RegExLabel
from extr.entities import create_entity_extractor

entity_extractor = create_entity_extractor(
    [
        RegExLabel('POSITION', [
            RegEx([r'pitcher'], re.IGNORECASE)
        ]),
    ],
    kb={
        'PERSON': ['Ted']
    }
)

entities = entity_extractor.get_entities(text)

## entities == [
##      <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
##      <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]

2. Visualize Entities in HTML

Annotate text to display in HTML.

from extr.entities.viewers import HtmlViewer

viewer = HtmlViewer()
viewer.append(text, entities)

html = viewer.create_view(custom_styles="""
    .lb-PERSON {
        background-color: orange;
    }

    .lb-POSITION {
        background-color: yellow;
    }
""")

3. Relation Extraction

Annotate and Extract Relationships between Entities

from extr.entities import EntityAnnotator
from extr.relations import RelationExtractor, \
                           RegExRelationLabelBuilder

## define relationship between PERSON and POSITION
relationship = RegExRelationLabelBuilder('is_a') \
    .add_e1_to_e2(
        'PERSON', ## e1
        [
            ## define how the relationship exists in nature
            r'\s+is\s+a\s+',
        ],
        'POSITION' ## e2
    ) \
    .build()

relations_to_extract = [relationship]

## `entities` see 'Entity Extraction' above
annotated_text = EntityAnnotator().annotate(text, entities)
relations = RelationExtractor(relations_to_extract).extract(annotated_text, entities)

## relations == [
##      <Relation e1="Ted" r="is_a" e2="Pitcher">
## ]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extr-0.0.44.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

extr-0.0.44-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file extr-0.0.44.tar.gz.

File metadata

  • Download URL: extr-0.0.44.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for extr-0.0.44.tar.gz
Algorithm Hash digest
SHA256 08330bf28c496b5743c5a51c04f1b4f8de1d6b87cabdfbb794339b6c2fc07673
MD5 34b979b226d50bc646d30e8fbbfa17bc
BLAKE2b-256 ae555303faafa47b5ad9c3242c723690cfad3cd387cf0539d068c9e0afa55d88

See more details on using hashes here.

File details

Details for the file extr-0.0.44-py3-none-any.whl.

File metadata

  • Download URL: extr-0.0.44-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for extr-0.0.44-py3-none-any.whl
Algorithm Hash digest
SHA256 93f1ce73482208fc0beab2f79b8c8e997f0c2261eb7b5f903f0e3ba379a74842
MD5 9c67c27be01a04b9b237f1b025b5c3c8
BLAKE2b-256 a76b5c9c0281700418196fbd93ac38e033c4a347cfe37a7869c9031c32c2b535

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page