Rule-based Named Entity Recognition
Project description
simple_NER
Lightweight named-entity recognition library with pluggable annotators, multi-language support, and an async pipeline.
Installation
pip install simple_NER
pip install "simple_NER[dev]" # + testing tools
Quick Start
from simple_NER import create_pipeline
pipe = create_pipeline(["email", "phone", "url", "temporal", "currency"])
for entity in pipe.process("Call +1-800-555-0100 or email info@example.com by 2025-06-01"):
print(entity.entity_type, entity.value, entity.confidence)
# phone +1-800-555-0100 0.9
# email info@example.com 1.0
# date 2025-06-01 0.85
Annotators
| Factory key(s) | Class | Detects | Language |
|---|---|---|---|
email, email_regex |
EmailAnnotator / EmailNER | Email addresses | Any |
names |
NamesNER | Person names (noun heuristic, confidence 0.65–0.8) | English / Latin |
locations, countries, cities |
LocationNER | Countries, capitals, cities | All (wordlist) |
temporal, datetime, duration |
TemporalNER | Dates, times, durations | lang param |
numbers, written_numbers |
NumberNER | Numeric and written numbers | lang param |
lookup, wordlist |
LookUpNER | Custom wordlists | lang param |
url, urls |
URLAnnotator | HTTP/HTTPS URLs | Any |
phone, phone_number |
PhoneAnnotator | Phone numbers | Any |
currency, money |
CurrencyAnnotator | Amounts + currency symbol/code | Any |
organization, org, company |
OrganizationAnnotator | Org/company names | lang param |
hashtag, hashtags, tag |
HashtagAnnotator | #hashtags | Any |
date, dates |
DateAnnotator | Structured date strings | lang param |
Key annotator parameters
LocationNER: include_countries=True, include_capitals=True, include_cities=False,
label_confidence={"City": 0.7, "Country": 0.95}
PhoneAnnotator: require_country_code=False, min_length=7
OrganizationAnnotator: strict_mode=False (when True, requires corporate suffix like Inc./GmbH)
TemporalNER / NumberNER / DateAnnotator / LookUpNER: lang="en-us", optionally anchor_date for TemporalNER
Entity Data Fields
Each Entity carries a data dict with annotator-specific fields:
| Annotator | Extra fields in data |
|---|---|
| EmailAnnotator | local_part, domain, start, end |
| URLAnnotator | protocol, start, end |
| PhoneAnnotator | digits, digit_count, type (international/us_national/local/other), has_country_code, start, end |
| CurrencyAnnotator | amount (float), currency (ISO code), currency_symbol, start, end |
| LocationNER | country_code, label, start, end |
| HashtagAnnotator | tag_type (shouting/lowercase/CamelCase/underscored/alphanumeric/mixed), start, end |
| OrganizationAnnotator | org_type (company/educational/medical/other), start, end |
| NumberNER | number (str, digit form), start, end |
| DateAnnotator | year, month, day, format, start, end |
Pipeline Dedup Strategies
NERPipeline and AsyncNERPipeline accept a dedup_strategy argument:
| Strategy | Behaviour |
|---|---|
keep_all |
Return every entity span, including overlaps |
keep_longest |
When spans overlap, keep the longer one |
keep_higher_confidence |
When spans overlap, keep the higher-confidence one |
keep_first |
When spans overlap, keep the first one encountered |
pipe = create_pipeline(["currency", "numbers"], dedup_strategy="keep_longest")
Locale / i18n System
Annotators load language-specific patterns from simple_NER/locale/<lang>/:
| Extension | Content | Loader |
|---|---|---|
.rx |
One raw regex per line | load_rx(name, lang) |
.intent |
NL templates {var} → named capture |
load_intents(name, lang) |
.txt |
Plain wordlist, one entry per line | load_wordlist(name, lang) |
All loaders fall back to en-us when no language-specific file exists.
intent_to_regex("{amount} dollars") converts an intent template to a compiled re.Pattern.
Adding a new language: create simple_NER/locale/<lang>/ and place .rx, .intent, or .txt files
that override the en-us defaults. Only the files you add are used; everything else falls back automatically.
Inside a BaseAnnotator subclass, self._load_rx("name") and self._load_intents("name") resolve
to self.lang automatically.
Existing locale data: en-us (phone, email, url, hashtag, currency, organization, date_months),
de-de (currency, organization, date_months), es/fr/it/nl/pt (date_months).
Async Batch Processing
import asyncio
from simple_NER.annotators.async_pipeline import AsyncNERPipeline
pipe = AsyncNERPipeline(dedup_strategy="keep_longest")
pipe.add_annotator(...)
async def run():
results = await pipe.process_batch_async(sentences, max_concurrency=10)
asyncio.run(run())
OVOS Plugin
simple_NER ships an intent-transformer plugin for the OpenVoiceOS / OVOS ecosystem.
Entry-point group: opm.transformer.intent, key: simple-ner-transformer, priority 50,
class: SimpleNERIntentTransformer.
{
"intent_transformers": {
"simple-ner-transformer": {
"annotators": ["email", "phone", "temporal", "currency"],
"confidence_threshold": 0.6,
"lang": "en-us"
}
}
}
The transformer runs the configured pipeline on every utterance and injects recognized entities
into match_data before intent handling proceeds.
Links
- docs/index.md — full API reference and architecture
- docs/TUTORIALS.md — step-by-step tutorials
- docs/API.md — detailed class and method docs
- examples/README.md — runnable example index
- GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_ner-0.9.1a1.tar.gz.
File metadata
- Download URL: simple_ner-0.9.1a1.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbbb4dee5ef760da464f117375bfd9d2903726eeda174b092d2a3d7430dda511
|
|
| MD5 |
eceaa7a7804402ae2cb41acc9574b00e
|
|
| BLAKE2b-256 |
7dffe0063e3d7c2097bc7d97e39b3d655c9de1a7c4118fb6868b6d76b9b55c88
|
File details
Details for the file simple_ner-0.9.1a1-py3-none-any.whl.
File metadata
- Download URL: simple_ner-0.9.1a1-py3-none-any.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5e59a7ee279053e10f1c71a58a6eb27c786ae2df4fdc869039aadfb791869a5
|
|
| MD5 |
14141ff16903925e6ca28379541ed3eb
|
|
| BLAKE2b-256 |
b9b504e4424beb57d127354e7cfe11e96e9e61a193b7038a1fbabec84aba250e
|