Skip to main content

Entity database for organizations, people, roles, and locations with embedding search

Project description

corp-entity-db

Entity database library and search engine for organizations, people, roles, and locations. Provides embedding-based semantic search over entities imported from GLEIF, SEC Edgar, Wikidata, and Companies House.

Installation

# Default: search and resolve (no build dependencies)
pip install corp-entity-db

# With database build/import support
pip install "corp-entity-db[build]"

# With HTTP server (corp-entity-db serve)
pip install "corp-entity-db[serve]"

# With remote client (EntityDBClient)
pip install "corp-entity-db[client]"

# Everything
pip install "corp-entity-db[all]"

The default install includes sentence-transformers, USearch, and huggingface_hub for searching and downloading pre-built databases. The embedding model (google/embeddinggemma-300m, 300M params) is downloaded automatically on first use.

Quick Start

# Download the lite database + USearch indexes
corp-entity-db download

# Search organizations
corp-entity-db search "Microsoft"
corp-entity-db search "Microsoft" --hybrid

# Search people
corp-entity-db search-people "Tim Cook"

# Show database statistics
corp-entity-db status

Python API

from corp_entity_db import OrganizationDatabase, get_database_path

db = OrganizationDatabase(get_database_path())
matches = db.search("Microsoft", limit=10)
for match in matches:
    print(f"{match.record.name} ({match.record.entity_type}) - score: {match.score:.3f}")

Server Mode

Keep models warm in memory for low-latency repeated searches (requires [serve] extra):

corp-entity-db serve                  # Start on localhost:8222
corp-entity-db serve --port 9000      # Custom port

Data Sources

Source Description Scale
Companies House UK registered companies + officers ~5.5M orgs, ~27.5M people
Wikidata Organizations & notable people ~1.7M orgs, ~39.4M people
GLEIF Legal Entity Identifier records ~2.6M orgs
SEC Edgar US public company filers & officers ~73K orgs
Total ~9.9M orgs, ~66.9M people

Database Variants

  • Lite (default download): No embedding tables, uses USearch HNSW indexes for search (~7GB)
  • Full: Includes float32 and int8 embedding tables (~32GB)

HuggingFace dataset: Corp-o-Rate-Community/entity-references

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corp_entity_db-0.2.0.tar.gz (171.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

corp_entity_db-0.2.0-py3-none-any.whl (190.5 kB view details)

Uploaded Python 3

File details

Details for the file corp_entity_db-0.2.0.tar.gz.

File metadata

  • Download URL: corp_entity_db-0.2.0.tar.gz
  • Upload date:
  • Size: 171.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for corp_entity_db-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8f5e90e15d299337615edde6ceea6099bcd10bed59feba56b5a2a9d2e1d55eeb
MD5 d534d0e3e07832e707d244553867f160
BLAKE2b-256 eaee5f07b2326563da2a62e883b400dfb1fb1d7c3bbc17568a8b854a02de008f

See more details on using hashes here.

File details

Details for the file corp_entity_db-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for corp_entity_db-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 68b6aed0f2ab8544f870fba94a7b2767e3d8a47b7b6cb7160772ce360e7b285d
MD5 7b8accf38cce28f5cb12b345195d487e
BLAKE2b-256 c07a1407b1a694966e68ecb9e869611fd6482243f4cb8311450d41ed40cc4b49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page