Skip to main content

Python package for detecting entities in text based on a dictionary and fuzzy similarity

Project description

LexiFuzz NER: Named Entity Recognition Based on Dictionary and Fuzzy Matching

Image

About

LexiFuzz NER is a Named Entity Recognition (NER) package designed to identify and extract named entities from unstructured text data. Leveraging a combination of dictionary-based and fuzzy matching techniques, LexiFuzz NER offers state-of-the-art accuracy in recognizing named entities in various domains, making it an invaluable tool for information extraction, natural language understanding, and text analytics.

Requirements

  • Python 3.7 or Higher
  • NLTK
  • TheFuzz

Key Features

  1. Dictionary-Based Recognition: LexiFuzz NER utilizes a comprehensive dictionary of named entities, encompassing a wide range of entities such as person names, organizations, locations, dates, and more. This dictionary is continuously updated to ensure high precision in entity recognition.

  2. Fuzzy Matching: The package employs advanced fuzzy matching algorithms to identify named entities even in cases of typographical errors, misspellings, or variations in naming conventions. This ensures robustness in recognizing entities with varying textual representations.

  3. Customization: LexiFuzz NER allows users to easily customize and expand the entity dictionary to suit specific domain or application requirements. This flexibility makes it adaptable to a wide array of use cases.

Usage

Manual Installation via Github

  1. Clone Repository
    git clone https://github.com/hanifabd/lexifuzz-ner.git
    
  2. Installation
    cd lexifuzz-ner && pip install .
    

Installation Using Pip

  1. Installation
    pip install lexifuzz-ner
    

Inference

  1. Usage

    from lexifuzz_ner.ner import find_entity
    
    dictionary = {
        'individual_product' : ['tahapan', 'xpresi', 'gold', 'berjangka'],
        'brand' : ["bca", "bank central asia"]
    }
    
    text = "i wanna ask about bca tahapn savings product"
    entities = find_entity(text, dictionary, 90)
    print(entities)
    
  2. Result

    {
        'entities': [
            {
                'id': '55a20c6b-bd4a-43ee-8853-b961ac537ca8',
                'entity': 'bca',
                'category': 'brand',
                'score': 100,
                'index': {'start': 18, 'end': 20}},
            {
                'id': '08917da5-ed51-44bb-9be9-52f17df2640a',
                'entity': 'tahapn',
                'category': 'individual_product',
                'score': 92,
                'index': {'start': 22, 'end': 28}
            }
        ],
        'text': 'i wanna ask about bca tahapn savings product',
        'text_annotated': 'i wanna ask about [bca]{55a20c6b-bd4a-43ee-8853-b961ac537ca8} [tahapn]{08917da5-ed51-44bb-9be9-52f17df2640a} savings product'
    }
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexifuzz_ner-0.0.8.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lexifuzz_ner-0.0.8-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file lexifuzz_ner-0.0.8.tar.gz.

File metadata

  • Download URL: lexifuzz_ner-0.0.8.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for lexifuzz_ner-0.0.8.tar.gz
Algorithm Hash digest
SHA256 fcb510728d77a69d63011f28bdf9b4dedfb01e5b0a789d9f942d90a7099982e2
MD5 43a63b5026f062c3ec18466771a45856
BLAKE2b-256 db7d38381e4854298a30a5f6e8d07be94b9908a093a3f7336073649dab8bf0a5

See more details on using hashes here.

File details

Details for the file lexifuzz_ner-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: lexifuzz_ner-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for lexifuzz_ner-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 4433d714a4e0e01eb6b74d5858fabd22a66bf12e1f70286a378212ebb6611db2
MD5 240d23c31dd4c857fa90438d3256e645
BLAKE2b-256 dbb4246340a34bc438fd9cbd6217da49a6a5f8e1686909d1aaebcc00e06409de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page