Entity database for organizations, people, roles, and locations with embedding search
Project description
corp-entity-db
Entity database library and search engine for organizations, people, roles, and locations. Provides embedding-based semantic search over entities imported from GLEIF, SEC Edgar, Wikidata, and Companies House.
Installation
# Default: search and resolve (no build dependencies)
pip install corp-entity-db
# With database build/import support
pip install "corp-entity-db[build]"
# With HTTP server (corp-entity-db serve)
pip install "corp-entity-db[serve]"
# With remote client (EntityDBClient)
pip install "corp-entity-db[client]"
# Everything
pip install "corp-entity-db[all]"
The default install includes sentence-transformers, USearch, and huggingface_hub for searching and downloading pre-built databases. The embedding model (google/embeddinggemma-300m, 300M params) is downloaded automatically on first use.
Quick Start
# Download the lite database + USearch indexes
corp-entity-db download
# Search organizations
corp-entity-db search "Microsoft"
corp-entity-db search "Microsoft" --hybrid
# Search people
corp-entity-db search-people "Tim Cook"
# Show database statistics
corp-entity-db status
Python API
from corp_entity_db import OrganizationDatabase, get_database_path
db = OrganizationDatabase(get_database_path())
matches = db.search("Microsoft", limit=10)
for match in matches:
print(f"{match.record.name} ({match.record.entity_type}) - score: {match.score:.3f}")
Server Mode
Keep models warm in memory for low-latency repeated searches (requires [serve] extra):
corp-entity-db serve # Start on localhost:8222
corp-entity-db serve --port 9000 # Custom port
Data Sources
| Source | Description | Scale |
|---|---|---|
| Companies House | UK registered companies + officers | ~5.5M orgs, ~27.5M people |
| Wikidata | Organizations & notable people | ~1.7M orgs, ~39.4M people |
| GLEIF | Legal Entity Identifier records | ~2.6M orgs |
| SEC Edgar | US public company filers & officers | ~73K orgs |
| Total | ~9.9M orgs, ~66.9M people |
Database Variants
- Lite (default download): No embedding tables, uses USearch HNSW indexes for search (~7GB)
- Full: Includes float32 and int8 embedding tables (~32GB)
HuggingFace dataset: Corp-o-Rate-Community/entity-references
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file corp_entity_db-0.2.0.tar.gz.
File metadata
- Download URL: corp_entity_db-0.2.0.tar.gz
- Upload date:
- Size: 171.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f5e90e15d299337615edde6ceea6099bcd10bed59feba56b5a2a9d2e1d55eeb
|
|
| MD5 |
d534d0e3e07832e707d244553867f160
|
|
| BLAKE2b-256 |
eaee5f07b2326563da2a62e883b400dfb1fb1d7c3bbc17568a8b854a02de008f
|
File details
Details for the file corp_entity_db-0.2.0-py3-none-any.whl.
File metadata
- Download URL: corp_entity_db-0.2.0-py3-none-any.whl
- Upload date:
- Size: 190.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68b6aed0f2ab8544f870fba94a7b2767e3d8a47b7b6cb7160772ce360e7b285d
|
|
| MD5 |
7b8accf38cce28f5cb12b345195d487e
|
|
| BLAKE2b-256 |
c07a1407b1a694966e68ecb9e869611fd6482243f4cb8311450d41ed40cc4b49
|