Skip to main content

spaCy-based named-entity recognition parser for Swarmauri with structured entity document output.

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_parser_entityrecognition Discord

Swarmauri Parser Entity Recognition

swarmauri_parser_entityrecognition is the Swarmauri named-entity recognition parser built on spaCy. It extracts named entities such as people, organizations, and geopolitical entities from unstructured text and returns Swarmauri Document objects containing the entity text and entity metadata.

Why Use Swarmauri Parser Entity Recognition

  • Turn raw text into structured entity objects inside a Swarmauri parser workflow.
  • Preserve entity labels and entity ids in a predictable Document shape for downstream enrichment, filtering, or indexing.
  • Use spaCy's English NER pipeline when available, while still retaining a minimal fallback path for constrained environments.
  • Fit entity extraction into larger ingestion, retrieval, anonymization, and knowledge-graph workflows.

FAQ

What does this parser return?
A list of Swarmauri Document objects, usually one per detected entity.

Which metadata fields are included?
entity_type, entity_id, and text.

What spaCy model does it use?
It tries to load en_core_web_sm.

What happens if the model is unavailable?
The parser attempts to download the model. If that fails, it falls back to a blank English pipeline plus a small regex-based fallback used as a best-effort compatibility path.

Features

  • Named-entity extraction via spaCy's English NER model.
  • Automatic attempt to download en_core_web_sm if the model is missing.
  • Best-effort fallback behavior for environments where the full model cannot be loaded.
  • Returns Swarmauri Document objects with entity label metadata.
  • Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_parser_entityrecognition
pip install swarmauri_parser_entityrecognition

Optional model bootstrap:

python -m spacy download en_core_web_sm

Usage

from swarmauri_parser_entityrecognition import EntityRecognitionParser

text = "Barack Obama was born in Hawaii and served as President of the United States."
parser = EntityRecognitionParser()
entities = parser.parse(text)

for entity in entities:
    print(entity.content, entity.metadata["entity_type"])

Examples

Parse organizations, places, and people

from swarmauri_parser_entityrecognition import EntityRecognitionParser

parser = EntityRecognitionParser()
docs = parser.parse(
    "Apple Inc. is planning to open a new office in New York City, according to CEO Tim Cook."
)

for doc in docs:
    print(doc.content, doc.metadata)

Handle non-string input

from swarmauri_parser_entityrecognition import EntityRecognitionParser

parser = EntityRecognitionParser()
print(parser.parse(42))

Inspect fallback-compatible metadata

from swarmauri_parser_entityrecognition import EntityRecognitionParser

parser = EntityRecognitionParser()
entities = parser.parse("Tim Cook announced new products in New York City for Apple Inc.")
print([entity.metadata for entity in entities])

Related Packages

Swarmauri Foundations

More Documentation

Best Practices

  • Preinstall en_core_web_sm in CI and production environments to avoid runtime downloads.
  • Treat the regex fallback as a compatibility path, not as production-quality entity recognition.
  • Strip markup or noisy boilerplate before parsing to improve entity quality.
  • Persist entity spans or link them to downstream IDs if you need durable knowledge-graph or indexing workflows.

License

This project is licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file swarmauri_parser_entityrecognition-0.11.0.dev1.tar.gz.

File metadata

  • Download URL: swarmauri_parser_entityrecognition-0.11.0.dev1.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_entityrecognition-0.11.0.dev1.tar.gz
Algorithm Hash digest
SHA256 9450717fec8b421c2db02ece349dfa4d27fc66deb41e86abdce856c7f56c50ec
MD5 8160bf9f40294e2b9baef63d6d9da4ba
BLAKE2b-256 2ecc49cfa519402c6c8c4dae83cf1723a45325e4dad8d425ee1dee4c3b27d972

See more details on using hashes here.

File details

Details for the file swarmauri_parser_entityrecognition-0.11.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_parser_entityrecognition-0.11.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_entityrecognition-0.11.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a2dd56f80b33775a5e5956ab59862d94d5252433ede2f1fbe81804e9ac065fe
MD5 4208394fbe1d54057fcb97ca69d20d5c
BLAKE2b-256 fe0aa9e59b794055d57c95258af5f7eefd75a110b29b43b2faab489bd22f7eaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page