Skip to main content

Entity Recognition Parser for Swarmauri.

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_parser_entityrecognition


Swarmauri Parser Entityrecognition

Named-entity recognition (NER) parser for Swarmauri built on spaCy. Extracts entities (PERSON, ORG, GPE, etc.) from unstructured text and returns Document objects with entity metadata.

Features

  • Uses spaCy's en_core_web_sm model by default (downloads automatically if missing).
  • Falls back to a blank English pipeline with minimal regex-based tagging when the full model is unavailable (best-effort mode).
  • Emits Document instances containing the entity text and metadata (entity_type, entity_id).

Prerequisites

  • Python 3.10 or newer.
  • spaCy and its English model. The parser attempts to download en_core_web_sm if missing; set SPACY_HOME or pre-install the model in production deployments.
  • If running without internet access, install the model ahead of time: python -m spacy download en_core_web_sm.

Installation

# pip
pip install swarmauri_parser_entityrecognition

# poetry
poetry add swarmauri_parser_entityrecognition

# uv (pyproject-based projects)
uv add swarmauri_parser_entityrecognition

Quickstart

from swarmauri_parser_entityrecognition import EntityRecognitionParser

text = "Barack Obama was born in Hawaii and served as President of the United States."
parser = EntityRecognitionParser()
entities = parser.parse(text)

for entity_doc in entities:
    print(entity_doc.content, entity_doc.metadata["entity_type"])

Batch Processing

texts = [
    "Apple Inc. unveiled new MacBooks in California.",
    "Tim Cook met investors in New York City.",
]

parser = EntityRecognitionParser()
results = [parser.parse(t) for t in texts]

for doc_set in results:
    for doc in doc_set:
        print(doc.content, doc.metadata["entity_type"])

Handling Fallback Mode

When spaCy's English model is unavailable, the parser performs best-effort matching using a blank pipeline and simple regex patterns. Check for entity_type values and the entity_id metadata to understand which mode produced the result.

parser = EntityRecognitionParser()
entities = parser.parse("Tim Cook announced new products in New York City for Apple Inc.")
print([d.metadata for d in entities])

Install spaCy models before production use to avoid fallback accuracy losses.

Tips

  • For languages beyond English, load a different spaCy model by changing the initialization logic (e.g., subclass the parser and load es_core_news_sm).
  • Preprocess text to remove noise (HTML tags, markup) before parsing to improve NER accuracy.
  • Combine with Swarmauri middleware or pipelines to fuse entity data with downstream tasks (e.g., knowledge graph enrichment, anonymization).

Want to help?

If you want to contribute to swarmauri-sdk, read up on our guidelines for contributing that will help you get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_parser_entityrecognition-0.8.3.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file swarmauri_parser_entityrecognition-0.8.3.tar.gz.

File metadata

  • Download URL: swarmauri_parser_entityrecognition-0.8.3.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_entityrecognition-0.8.3.tar.gz
Algorithm Hash digest
SHA256 3e4a00ddd218b30d0be8732280f5512acde444cdc92bb35fe67343e55a97a824
MD5 23ccff30c540768a85feb5f47d7a5d29
BLAKE2b-256 689db7c80585f17a1d11950a2162c61fe1857e425b0f8b7b033109a9596eabfe

See more details on using hashes here.

File details

Details for the file swarmauri_parser_entityrecognition-0.8.3-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_parser_entityrecognition-0.8.3-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_entityrecognition-0.8.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c33d60f82d0ce901a595d602eee37cc977da888711fa8e3643661e868ceb0ba4
MD5 a6c1fe62506f42e0d3a4da34de55534a
BLAKE2b-256 8f26db964017723beefbe2c5449a6ade662eb3978259aa47f1114217d6a93ff4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page