Skip to main content

Rule-based Named Entity Recognition

Project description

simple_NER

Lightweight named-entity recognition library with pluggable annotators, multi-language support, and an async pipeline.

PyPI - Version PyPI - Python Version GitHub Actions Workflow Status

Installation

pip install simple_NER
pip install "simple_NER[dev]"   # + testing tools

Quick Start

from simple_NER import create_pipeline

pipe = create_pipeline(["email", "phone", "url", "temporal", "currency"])
for entity in pipe.process("Call +1-800-555-0100 or email info@example.com by 2025-06-01"):
    print(entity.entity_type, entity.value, entity.confidence)
# phone    +1-800-555-0100   0.9
# email    info@example.com  1.0
# date     2025-06-01        0.85

Annotators

Factory key(s) Class Detects Language
email, email_regex EmailAnnotator / EmailNER Email addresses Any
names NamesNER Person names (noun heuristic, confidence 0.65–0.8) English / Latin
locations, countries, cities LocationNER Countries, capitals, cities All (wordlist)
temporal, datetime, duration TemporalNER Dates, times, durations lang param
numbers, written_numbers NumberNER Numeric and written numbers lang param
lookup, wordlist LookUpNER Custom wordlists lang param
url, urls URLAnnotator HTTP/HTTPS URLs Any
phone, phone_number PhoneAnnotator Phone numbers Any
currency, money CurrencyAnnotator Amounts + currency symbol/code Any
organization, org, company OrganizationAnnotator Org/company names lang param
hashtag, hashtags, tag HashtagAnnotator #hashtags Any
date, dates DateAnnotator Structured date strings lang param

Key annotator parameters

LocationNER: include_countries=True, include_capitals=True, include_cities=False, label_confidence={"City": 0.7, "Country": 0.95}

PhoneAnnotator: require_country_code=False, min_length=7

OrganizationAnnotator: strict_mode=False (when True, requires corporate suffix like Inc./GmbH)

TemporalNER / NumberNER / DateAnnotator / LookUpNER: lang="en-us", optionally anchor_date for TemporalNER

Entity Data Fields

Each Entity carries a data dict with annotator-specific fields:

Annotator Extra fields in data
EmailAnnotator local_part, domain, start, end
URLAnnotator protocol, start, end
PhoneAnnotator digits, digit_count, type (international/us_national/local/other), has_country_code, start, end
CurrencyAnnotator amount (float), currency (ISO code), currency_symbol, start, end
LocationNER country_code, label, start, end
HashtagAnnotator tag_type (shouting/lowercase/CamelCase/underscored/alphanumeric/mixed), start, end
OrganizationAnnotator org_type (company/educational/medical/other), start, end
NumberNER number (str, digit form), start, end
DateAnnotator year, month, day, format, start, end

Pipeline Dedup Strategies

NERPipeline and AsyncNERPipeline accept a dedup_strategy argument:

Strategy Behaviour
keep_all Return every entity span, including overlaps
keep_longest When spans overlap, keep the longer one
keep_higher_confidence When spans overlap, keep the higher-confidence one
keep_first When spans overlap, keep the first one encountered
pipe = create_pipeline(["currency", "numbers"], dedup_strategy="keep_longest")

Locale / i18n System

Annotators load language-specific patterns from simple_NER/locale/<lang>/:

Extension Content Loader
.rx One raw regex per line load_rx(name, lang)
.intent NL templates {var} → named capture load_intents(name, lang)
.txt Plain wordlist, one entry per line load_wordlist(name, lang)

All loaders fall back to en-us when no language-specific file exists. intent_to_regex("{amount} dollars") converts an intent template to a compiled re.Pattern.

Adding a new language: create simple_NER/locale/<lang>/ and place .rx, .intent, or .txt files that override the en-us defaults. Only the files you add are used; everything else falls back automatically. Inside a BaseAnnotator subclass, self._load_rx("name") and self._load_intents("name") resolve to self.lang automatically.

Existing locale data: en-us (phone, email, url, hashtag, currency, organization, date_months), de-de (currency, organization, date_months), es/fr/it/nl/pt (date_months).

Async Batch Processing

import asyncio
from simple_NER.annotators.async_pipeline import AsyncNERPipeline

pipe = AsyncNERPipeline(dedup_strategy="keep_longest")
pipe.add_annotator(...)

async def run():
    results = await pipe.process_batch_async(sentences, max_concurrency=10)

asyncio.run(run())

OVOS Plugin

simple_NER ships an intent-transformer plugin for the OpenVoiceOS / OVOS ecosystem. Entry-point group: opm.transformer.intent, key: simple-ner-transformer, priority 50, class: SimpleNERIntentTransformer.

{
  "intent_transformers": {
    "simple-ner-transformer": {
      "annotators": ["email", "phone", "temporal", "currency"],
      "confidence_threshold": 0.6,
      "lang": "en-us"
    }
  }
}

The transformer runs the configured pipeline on every utterance and injects recognized entities into match_data before intent handling proceeds.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_ner-0.9.1a1.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_ner-0.9.1a1-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file simple_ner-0.9.1a1.tar.gz.

File metadata

  • Download URL: simple_ner-0.9.1a1.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ner-0.9.1a1.tar.gz
Algorithm Hash digest
SHA256 fbbb4dee5ef760da464f117375bfd9d2903726eeda174b092d2a3d7430dda511
MD5 eceaa7a7804402ae2cb41acc9574b00e
BLAKE2b-256 7dffe0063e3d7c2097bc7d97e39b3d655c9de1a7c4118fb6868b6d76b9b55c88

See more details on using hashes here.

File details

Details for the file simple_ner-0.9.1a1-py3-none-any.whl.

File metadata

  • Download URL: simple_ner-0.9.1a1-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simple_ner-0.9.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 e5e59a7ee279053e10f1c71a58a6eb27c786ae2df4fdc869039aadfb791869a5
MD5 14141ff16903925e6ca28379541ed3eb
BLAKE2b-256 b9b504e4424beb57d127354e7cfe11e96e9e61a193b7038a1fbabec84aba250e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page