Skip to main content

Rule based Named Entity Recognition tool.

Project description

funer

funer is Rule based Named Entity Recognition tool.

With funer, you can do the following:

  • Create rule based NER model.
  • Improve the rule and labeled data by comparing both.
  • Labeling data with labeling functioins.

Example

Install

How to use

Create Document.

from funer.document import Document
import spacy

# Create documents
nlp = spacy.load("en_core_web_sm")

labeled_document_1 = Document.from_spacy_doc(
    nlp("Donald John Trump was born in New York."),
    gold_entities=[
        (0, 17, "PER"), # Donald John Trump
        (30, 38, "LOC") # New York
    ]
)
labeled_document_2 = Document.from_spacy_doc(
    nlp("Abe Rosenthal was editor-in-chief of the New York Times in 1998."),
    gold_entities=[
        (0, 13, "PER"),   # Abe Rosenthal
        (41, 55, "ORG"),  # New York Times
        (59, 63, "DATE"), # 1998
    ]
)
nolabeled_document = Document.from_spacy_doc(
    nlp("I want to go to New York."),
)
documents = [labeled_document_1, labeled_document_2, nolabeled_document]

## Option: Tokenized
tokenized_labeled_document_1 = Document(
    tokens=['Donald', 'John', 'Trump', 'was', 'born', 'in', 'New', 'York', '.'],
    spaces=[True, True, True, True, True, True, True, False, False],
    gold_label=['B-PER', 'I-PER', 'I-PER', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'O'],
)

Define Labeling Functions.

from funer.annotators.dictionary_annotator import DictionaryAnnotator
from funer.annotators.token_condition_annotator import (
    TokensConditionAnnotator, generate_token_conditions_function)
from funer.annotators.span_condition_annotator import SpanConditionAnnotator

# Labeling functions
## Define  Labeling Functions

# f1: Per-token labeling function
def detect_name(tokens):
    for i in range(len(tokens) - 3):
        if tokens[i:i + 3] == ["Donald", "John", "Trump"]:
            yield i, i + 3
f1 = TokensConditionAnnotator(
    name="person_f",
    f=detect_name,
    label="PER"
)

# f2: Per-token labeling function using generate_token_conditions_function
f2 = TokensConditionAnnotator(
    name="year_f",
    f=generate_token_conditions_function([
        lambda token_1: re.search(r"(19|20)\d{2}", token_1) is not None,
    ]),
    label="DATE"
)

# f3: Per-character labeling functions
def span_condition_function(text: str):
    for m in re.finditer(r"Abe Rosenthal", text):
        yield m.start(), m.end()
f3 = SpanConditionAnnotator(
    name="person_f2",
    f=span_condition_function,
    label="PER"
)


# f4: Labeling functions with dictionary
#   : (Note) Example of mistakenly extracting New York of New York Times as LOC
loc_dictionary = ["New York", "Minneapolis"]
f4 = DictionaryAnnotator(
    name="city_f",
    words=loc_dictionary,
    label="LOC"
)

Apply labeling functions to documents.

from funer.labeling_function_applier import LabelingFunctionApplier
from funer.aggregators.majority_voting_aggregators import MajorityVotingAggregator
from funer.utils import show_labels

# Apply of labeling functions
lf_applier = LabelingFunctionApplier(lfs=[f1, f2, f3, f4])
documents = lf_applier.apply(documents)

# Integration of labeling results
aggregator = MajorityVotingAggregator()
documents = aggregator.aggregate(documents)

# Output Results
print(show_labels(documents[0]))
# > tokens       Donald   John    Trump   was   born   in   New     York    .
# > =========================================================================
# > gold_label   B-PER    I-PER   I-PER   O     O      O    B-LOC   I-LOC   O
# > -------------------------------------------------------------------------
# > person_f     B-PER    I-PER   I-PER   O     O      O    O       O       O
# > city_f       O        O       O       O     O      O    B-LOC   I-LOC   O
# > -------------------------------------------------------------------------
# > aggregate    B-PER    I-PER   I-PER   O     O      O    B-LOC   I-LOC   O

# Show stats
print(lf_applier.show_stats())
# > f_name    | pos | neg | hit
# > ==========+=====+=====+====
# > person_f  | 1   | 0   | 1  
# > year_f    | 1   | 0   | 1  
# > person_f2 | 1   | 0   | 1  
# > city_f    | 1   | 1   | 2  

# Get label
print(documents[0].export_bio_label())
# > ['B-PER', 'I-PER', 'I-PER', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'O']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funer-0.0.2.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

funer-0.0.2-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file funer-0.0.2.tar.gz.

File metadata

  • Download URL: funer-0.0.2.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.1 Darwin/21.3.0

File hashes

Hashes for funer-0.0.2.tar.gz
Algorithm Hash digest
SHA256 94d08fec6803da9efc75fb466fb5aaddc55186f26c067cd970847149a7d0b9ec
MD5 fabffdd96fade22d2a5c24731a6c5ab3
BLAKE2b-256 98668b597c27ef4d847a8a768c2cd41b47df3f2f62fd125a1e2b34742ad24e44

See more details on using hashes here.

File details

Details for the file funer-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: funer-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.1 Darwin/21.3.0

File hashes

Hashes for funer-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6bd34f7deb40389f10529b35855711c1de41b07dab25427e118af5bcc5b4a61e
MD5 795c21a514388171ad0705c689ce3aad
BLAKE2b-256 f81fb9ecea5188dcc56e8d0dd43f56ca6f5dd8d003a58c3a5c0bbe6389c5f228

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page