Skip to main content

Rule-based Named Entity Recognition

Project description

simple_NER

Lightweight named-entity recognition library with pluggable annotators, multi-language support, and an async pipeline.

PyPI - Version PyPI - Python Version GitHub Actions Workflow Status

Installation

pip install simple_NER
pip install "simple_NER[dev]"   # + testing tools

Quick Start

from simple_NER import create_pipeline

pipe = create_pipeline(["email", "phone", "url", "temporal", "currency"])
for entity in pipe.process("Call +1-800-555-0100 or email info@example.com by 2025-06-01"):
    print(entity.entity_type, entity.value, entity.confidence)
# phone    +1-800-555-0100   0.9
# email    info@example.com  1.0
# date     2025-06-01        0.85

Annotators

Factory key(s) Class Detects Language
email, email_regex EmailAnnotator / EmailNER Email addresses Any
names NamesNER Person names (noun heuristic, confidence 0.65–0.8) English / Latin
locations, countries, cities LocationNER Countries, capitals, cities All (wordlist)
temporal, datetime, duration TemporalNER Dates, times, durations lang param
numbers, written_numbers NumberNER Numeric and written numbers lang param
lookup, wordlist LookUpNER Custom wordlists lang param
url, urls URLAnnotator HTTP/HTTPS URLs Any
phone, phone_number PhoneAnnotator Phone numbers Any
currency, money CurrencyAnnotator Amounts + currency symbol/code Any
organization, org, company OrganizationAnnotator Org/company names lang param
hashtag, hashtags, tag HashtagAnnotator #hashtags Any
date, dates DateAnnotator Structured date strings lang param

Key annotator parameters

LocationNER: include_countries=True, include_capitals=True, include_cities=False, label_confidence={"City": 0.7, "Country": 0.95}

PhoneAnnotator: require_country_code=False, min_length=7

OrganizationAnnotator: strict_mode=False (when True, requires corporate suffix like Inc./GmbH)

TemporalNER / NumberNER / DateAnnotator / LookUpNER: lang="en-us", optionally anchor_date for TemporalNER

Entity Data Fields

Each Entity carries a data dict with annotator-specific fields:

Annotator Extra fields in data
EmailAnnotator local_part, domain, start, end
URLAnnotator protocol, start, end
PhoneAnnotator digits, digit_count, type (international/us_national/local/other), has_country_code, start, end
CurrencyAnnotator amount (float), currency (ISO code), currency_symbol, start, end
LocationNER country_code, label, start, end
HashtagAnnotator tag_type (shouting/lowercase/CamelCase/underscored/alphanumeric/mixed), start, end
OrganizationAnnotator org_type (company/educational/medical/other), start, end
NumberNER number (str, digit form), start, end
DateAnnotator year, month, day, format, start, end

Pipeline Dedup Strategies

NERPipeline and AsyncNERPipeline accept a dedup_strategy argument:

Strategy Behaviour
keep_all Return every entity span, including overlaps
keep_longest When spans overlap, keep the longer one
keep_higher_confidence When spans overlap, keep the higher-confidence one
keep_first When spans overlap, keep the first one encountered
pipe = create_pipeline(["currency", "numbers"], dedup_strategy="keep_longest")

Locale / i18n System

Annotators load language-specific patterns from simple_NER/locale/<lang>/:

Extension Content Loader
.rx One raw regex per line load_rx(name, lang)
.intent NL templates {var} → named capture load_intents(name, lang)
.txt Plain wordlist, one entry per line load_wordlist(name, lang)

All loaders fall back to en-us when no language-specific file exists. intent_to_regex("{amount} dollars") converts an intent template to a compiled re.Pattern.

Adding a new language: create simple_NER/locale/<lang>/ and place .rx, .intent, or .txt files that override the en-us defaults. Only the files you add are used; everything else falls back automatically. Inside a BaseAnnotator subclass, self._load_rx("name") and self._load_intents("name") resolve to self.lang automatically.

Existing locale data: en-us (phone, email, url, hashtag, currency, organization, date_months), de-de (currency, organization, date_months), es/fr/it/nl/pt (date_months).

Async Batch Processing

import asyncio
from simple_NER.annotators.async_pipeline import AsyncNERPipeline

pipe = AsyncNERPipeline(dedup_strategy="keep_longest")
pipe.add_annotator(...)

async def run():
    results = await pipe.process_batch_async(sentences, max_concurrency=10)

asyncio.run(run())

OVOS Plugin

simple_NER ships an intent-transformer plugin for the OpenVoiceOS / OVOS ecosystem. Entry-point group: opm.transformer.intent, key: simple-ner-transformer, priority 50, class: SimpleNERIntentTransformer.

{
  "intent_transformers": {
    "simple-ner-transformer": {
      "annotators": ["email", "phone", "temporal", "currency"],
      "confidence_threshold": 0.6,
      "lang": "en-us"
    }
  }
}

The transformer runs the configured pipeline on every utterance and injects recognized entities into match_data before intent handling proceeds.

Documentation

New to simple_NER? Start here:

Need details?

Learn by example:

  • examples/README.md — 15+ runnable scripts (quick start, all annotators, async, custom types, OVOS plugin, etc.)

Project info:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_ner-0.9.1a2.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_ner-0.9.1a2-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file simple_ner-0.9.1a2.tar.gz.

File metadata

  • Download URL: simple_ner-0.9.1a2.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ner-0.9.1a2.tar.gz
Algorithm Hash digest
SHA256 2ffa8420acb2609f2a608690a7c96f46d6fbe9f19303d88ec3cbbc99f5a3021e
MD5 2e7fe51e5503942530e17b0396b08d14
BLAKE2b-256 d333dff60e6e9b0ec2cea2d76680f204745fea486e20a477b6512264b3efb661

See more details on using hashes here.

File details

Details for the file simple_ner-0.9.1a2-py3-none-any.whl.

File metadata

  • Download URL: simple_ner-0.9.1a2-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ner-0.9.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 ea2696a4eddeabbe5d9262bc70808506904b40d7e8437acb70ef62d7940dcd93
MD5 790000f96eeee06b572bb3c983003d65
BLAKE2b-256 0f0febaaf4870d72970138392d85c0f343cb53277cd80381471934617dd672f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page