Lightweight entity resolution - from company_tickers.json to enterprise
Project description
EntitySpine
Zero-Dependency Entity Resolution for SEC EDGAR Data
From company_tickers.json to enterprise-grade Knowledge Graph — without forcing dependencies.
Quick Start • Why EntitySpine? • Examples • Architecture • API • Contributing
🎯 What is EntitySpine?
EntitySpine solves the entity resolution problem for SEC EDGAR data:
"Is CIK 0000320193 the same company as ticker AAPL on NASDAQ?"
It provides:
- 🔍 Entity Resolution — Resolve tickers, CIKs, CUSIPs to canonical entities
- 📊 Knowledge Graph — Model companies, people, relationships, events
- 🗃️ Tiered Storage — JSON → SQLite → DuckDB → PostgreSQL
- ⚡ Zero Core Dependencies — stdlib-only for Tier 0-1
⚡ Quick Start
Installation
# Core (zero dependencies)
pip install entityspine
# With optional features
pip install "entityspine[pydantic]" # Validation wrappers
pip install "entityspine[orm]" # SQLModel/SQLAlchemy
pip install "entityspine[duckdb]" # Analytics tier
pip install "entityspine[full]" # Everything
30-Second Example
from entityspine import SqliteStore
# Create store and load ~14,000 SEC companies (auto-downloads)
store = SqliteStore(":memory:")
store.initialize()
store.load_sec_data() # Fetches from SEC automatically
# Resolve by ticker
results = store.search_entities("AAPL")
entity, score = results[0]
print(f"{entity.primary_name} (CIK: {entity.source_id})")
# Apple Inc. (CIK: 0000320193)
# Resolve by CIK
entities = store.get_entities_by_cik("0000320193")
print(entities[0].primary_name)
# Apple Inc.
🤔 Why EntitySpine?
The Problem
SEC EDGAR data uses multiple identifiers that don't naturally connect:
| Identifier | Example | What It Identifies |
|---|---|---|
| CIK | 0000320193 | Legal filing entity |
| Ticker | AAPL | Exchange listing |
| CUSIP | 037833100 | Security instrument |
| LEI | HWUPKR0MPOU8FGXBT394 | Global legal entity |
Questions that are hard to answer:
- Is
AAPLand0000320193the same company? ✅ Yes - Did
FBbecomeMETA? How do I track that? 🤔 - Which company is
GOOGvsGOOGL? Same entity, different securities - What's the LEI for CIK 0001018724? 🤷
The Solution
EntitySpine provides a canonical entity model with identifier claims:
┌─────────────────────────────────────────────────────────────────┐
│ Entity │
│ "Apple Inc." │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ CIK Claim │ │ LEI Claim │ │ EIN Claim │ │
│ │ 0000320193 │ │ HWUPKR... │ │ 94-2404110 │ │
│ │ source: SEC │ │ source:GLEIF│ │ source: IRS │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Security │ │
│ │ "Apple Common Stock" │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │CUSIP Claim │ │ISIN Claim │ │FIGI Claim │ │ │
│ │ │ 037833100 │ │US037833... │ │BBG000B9XRY4│ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ Listing (NASDAQ) │ │ │
│ │ │ Ticker: AAPL | MIC: XNAS | Status: ACTIVE │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
📚 Examples
Example 1: Load SEC Company Tickers
"""Load SEC's company_tickers.json into EntitySpine."""
from entityspine import SqliteStore
import httpx
import json
# Download SEC data
url = "https://www.sec.gov/files/company_tickers.json"
headers = {"User-Agent": "MyApp/1.0 (contact@example.com)"}
response = httpx.get(url, headers=headers)
sec_data = response.json()
# Load into EntitySpine
store = SqliteStore("./sec_entities.db")
store.initialize()
loaded = store.load_sec_json(sec_data)
print(f"Loaded {loaded} entities")
# Loaded 10,847 entities
# Query
results = store.search_entities("Microsoft")
for entity, score in results[:5]:
print(f" {score:.2f} | {entity.primary_name}")
# 1.00 | Microsoft Corporation
# 0.70 | Microsoft Payments Inc
Example 2: Build Knowledge Graph from Filing
"""Extract entities and relationships from a 10-K filing."""
from entityspine import (
Entity, EntityType, EntityStatus,
IdentifierClaim, IdentifierScheme, VendorNamespace, ClaimStatus,
Relationship, NodeRef, NodeKind, RelationshipType,
SqliteStore,
)
store = SqliteStore("./knowledge_graph.db")
store.initialize()
# Create NVIDIA entity
nvidia = Entity(
primary_name="NVIDIA Corporation",
entity_type=EntityType.ORGANIZATION,
status=EntityStatus.ACTIVE,
jurisdiction="DE",
sic_code="3674",
source_system="sec-edgar",
source_id="0001045810",
)
store.save_entity(nvidia)
# Attach CIK with SEC filing evidence
cik_claim = IdentifierClaim(
entity_id=nvidia.entity_id,
scheme=IdentifierScheme.CIK,
value="0001045810",
namespace=VendorNamespace.SEC,
source="sec-edgar",
source_ref="0001045810-24-000029", # Accession number
confidence=1.0,
)
store.save_claim(cik_claim)
# Create supplier entity
tsmc = Entity(
primary_name="Taiwan Semiconductor Manufacturing Company",
entity_type=EntityType.ORGANIZATION,
source_system="sec-edgar",
)
store.save_entity(tsmc)
# Create supplier relationship with evidence
relationship = Relationship(
source_ref=NodeRef(NodeKind.ENTITY, nvidia.entity_id),
target_ref=NodeRef(NodeKind.ENTITY, tsmc.entity_id),
relationship_type=RelationshipType.SUPPLIER,
confidence=0.95,
evidence_filing_id="0001045810-24-000029",
evidence_snippet="TSMC manufactures substantially all of our GPUs...",
source_system="sec-edgar",
)
store.save_relationship(relationship)
print(f"Entities: {store.entity_count()}")
print(f"Relationships: {store.relationship_count()}")
Example 3: py-sec-edgar Integration
"""Ingest SEC filings using the integration module."""
from datetime import date
from entityspine.integration import (
FilingFacts,
FilingEvidence,
ingest_filing_facts,
)
from entityspine.integration.contracts import (
ExtractedEntity,
ExtractedRelationship,
)
from entityspine import SqliteStore
store = SqliteStore("./filings.db")
store.initialize()
# Build facts from a 10-K (this would come from py-sec-edgar)
facts = FilingFacts(
evidence=FilingEvidence(
accession_number="0001045810-24-000029",
form_type="10-K",
filed_date=date(2024, 2, 21),
cik="0001045810",
),
registrant_name="NVIDIA Corporation",
registrant_cik="0001045810",
registrant_ticker="NVDA",
registrant_exchange="NASDAQ",
registrant_sic="3674",
registrant_state="DE",
entities=[
ExtractedEntity(name="Jensen Huang", entity_type="person"),
ExtractedEntity(name="TSMC", entity_type="organization"),
ExtractedEntity(name="Microsoft", entity_type="organization"),
],
relationships=[
ExtractedRelationship(
source_name="NVIDIA Corporation",
target_name="TSMC",
relationship_type="SUPPLIER",
evidence_snippet="TSMC manufactures our GPUs",
),
ExtractedRelationship(
source_name="NVIDIA Corporation",
target_name="Microsoft",
relationship_type="CUSTOMER",
evidence_snippet="Microsoft is a major customer for datacenter",
),
],
)
# Ingest into knowledge graph
result = ingest_filing_facts(store, facts)
print(f"Created {result.entities_created} entities")
print(f"Created {result.relationships_created} relationships")
print(f"Created {result.claims_created} identifier claims")
# Created 4 entities
# Created 2 relationships
# Created 2 identifier claims
Example 4: Multi-Identifier Resolution
"""Resolve entities across multiple identifier schemes."""
from entityspine import SqliteStore
store = SqliteStore("./entities.db")
store.initialize()
# Search works across CIK, ticker, and name
queries = ["AAPL", "0000320193", "Apple Inc", "Apple"]
for query in queries:
results = store.search_entities(query, limit=1)
if results:
entity, score = results[0]
print(f"'{query}' → {entity.primary_name} (score: {score:.2f})")
# 'AAPL' → Apple Inc. (score: 1.00)
# '0000320193' → Apple Inc. (score: 1.00)
# 'Apple Inc' → Apple Inc. (score: 1.00)
# 'Apple' → Apple Inc. (score: 0.70)
Example 5: Track Corporate Actions
"""Track mergers, name changes, and ticker changes."""
from entityspine import Entity, EntityStatus, SqliteStore
from entityspine.domain.timestamps import utc_now
store = SqliteStore("./corporate_actions.db")
store.initialize()
# Original entity
facebook = Entity(
primary_name="Facebook, Inc.",
entity_type=EntityType.ORGANIZATION,
source_system="sec-edgar",
source_id="0001326801",
)
store.save_entity(facebook)
# After rebranding - create redirect
meta = Entity(
primary_name="Meta Platforms, Inc.",
entity_type=EntityType.ORGANIZATION,
source_system="sec-edgar",
source_id="0001326801", # Same CIK
)
store.save_entity(meta)
# Mark old entity as merged
facebook_merged = facebook.with_update(
status=EntityStatus.MERGED,
redirect_to=meta.entity_id,
redirect_reason="Rebranded to Meta Platforms, Inc.",
merged_at=utc_now(),
)
store.save_entity(facebook_merged)
# Lookups automatically follow redirect
results = store.search_entities("Facebook")
entity, _ = results[0]
print(f"Facebook resolved to: {entity.primary_name}")
# Facebook resolved to: Meta Platforms, Inc.
🏗️ Architecture
Domain is Canonical
┌─────────────────────────────────────────────────────────────────┐
│ entityspine.domain │
│ (stdlib dataclasses only) │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌─────────────────┐ │
│ │ Entity │ │ Security │ │ Listing │ │IdentifierClaim │ │
│ └─────────┘ └──────────┘ └─────────┘ └─────────────────┘ │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌─────────────────┐ │
│ │ Asset │ │ Contract │ │ Product │ │ Relationship │ │
│ └─────────┘ └──────────┘ └─────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
↑ returns domain dataclasses
┌─────────────────────────────────────────────────────────────────┐
│ entityspine.stores │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ JsonStore │ │ SqliteStore │ (Tier 0-1, stdlib) │
│ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────────┘
↑ optional extras
┌─────────────────────────────────────────────────────────────────┐
│ entityspine.adapters │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ pydantic/ wrappers │ │ orm/ SqlModelStore │ │
│ │ to_domain/from_dom │ │ returns domain │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ pip install .[pydantic] pip install .[orm] │
└─────────────────────────────────────────────────────────────────┘
Storage Tiers
| Tier | Backend | Dependencies | Use Case | Temporal? |
|---|---|---|---|---|
| 0 | JSON file | None | Scripts, CLI | ❌ |
| 1 | SQLite | None | Local dev | ❌ |
| 2 | DuckDB | [duckdb] |
Analytics | ⏳ Planned |
| 3 | PostgreSQL | [postgres] |
Production | ✅ |
Tier Honesty
Lower tiers warn when they can't fulfill advanced queries:
result = store.resolve("AAPL", as_of="2015-01-01")
if not result.as_of_honored:
for warning in result.warnings:
print(f"⚠️ {warning}")
# ⚠️ as_of parameter ignored: temporal resolution requires Tier 2+
📖 Domain Models
Core Models
| Model | Purpose | Key Fields |
|---|---|---|
Entity |
Legal/organizational identity | primary_name, entity_type, jurisdiction |
Security |
Tradeable instrument | security_type, entity_id, description |
Listing |
Exchange-specific ticker | ticker, exchange, mic, security_id |
IdentifierClaim |
Identifier with provenance | scheme, value, entity_id, confidence |
Knowledge Graph Nodes
| Model | Purpose |
|---|---|
Person |
Natural persons (executives, directors) |
Asset |
Physical/tangible assets |
Contract |
Material agreements |
Product |
Products/services |
Brand |
Brand identities |
Event |
Discrete business events |
Case |
Legal proceedings |
Geo |
Geographic locations |
Address |
Physical addresses |
Edge Models
| Model | Purpose |
|---|---|
RoleAssignment |
Person→Org roles (CEO, CFO, Director) |
Relationship |
Generic node→node edges with evidence |
EntityRelationship |
Entity→Entity relationships |
🔧 Configuration
Environment Variables
# Storage path (default: ./entityspine.db)
ENTITYSPINE_DB_PATH=./data/entities.db
# Log level
ENTITYSPINE_LOG_LEVEL=INFO
Programmatic Configuration
from entityspine import SqliteStore
# In-memory for testing
store = SqliteStore(":memory:")
# File-based
store = SqliteStore("./entities.db")
# With explicit WAL mode (better concurrency)
store = SqliteStore("./entities.db", wal_mode=True)
🧪 Testing
# Run all tests
pytest
# With coverage
pytest --cov=entityspine --cov-report=html
# Only unit tests
pytest tests/unit -v
# Only integration tests
pytest tests/integration -v
📁 Project Structure
entityspine/
├── src/entityspine/
│ ├── __init__.py # Package exports
│ ├── domain/ # Canonical stdlib dataclasses
│ │ ├── entity.py # Entity model
│ │ ├── security.py # Security model
│ │ ├── listing.py # Listing model
│ │ ├── claim.py # IdentifierClaim model
│ │ ├── graph.py # KG nodes (Asset, Contract, etc.)
│ │ ├── enums.py # All enumerations
│ │ └── validators.py # Normalization + validation
│ ├── stores/ # Storage backends
│ │ ├── sqlite_store.py # Tier 1 (stdlib sqlite3)
│ │ ├── json_store.py # Tier 0 (JSON file)
│ │ └── mappers.py # Domain ↔ dict conversion
│ ├── adapters/ # Optional adapters
│ │ ├── pydantic/ # Pydantic validation wrappers
│ │ └── orm/ # SQLModel/SQLAlchemy layer
│ ├── integration/ # py-sec-edgar integration
│ │ ├── contracts.py # FilingFacts schema
│ │ ├── ingest.py # Ingestion functions
│ │ └── normalize.py # SEC identifier normalizers
│ └── core/ # Utilities
│ ├── ulid.py # ULID generation
│ └── timestamps.py # UTC timestamp utilities
├── tests/ # 303 tests
├── examples/ # Usage examples
└── docs/ # Documentation
🚀 Roadmap
- v0.3.x — Core entity resolution, KG nodes, integration module
- v0.4.x — DuckDB Tier 2, temporal queries
- v0.5.x — PostgreSQL Tier 3, full temporal support
- v0.6.x — FastAPI service, graph traversal API
- v1.0.0 — Production-ready, comprehensive documentation
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Setup development environment
git clone https://github.com/ryansmccoy/entity-spine.git
cd entity-spine
pip install -e ".[dev]"
pre-commit install
# Run tests
pytest
# Run linting
ruff check src tests
mypy src/entityspine
📄 License
MIT License — see LICENSE for details.
🔗 Links
- Repository: https://github.com/ryansmccoy/entity-spine
- PyPI: https://pypi.org/project/entityspine/
- Documentation: https://github.com/ryansmccoy/entity-spine/tree/main/docs
- Issues: https://github.com/ryansmccoy/entity-spine/issues
Built with ❤️ for the SEC EDGAR community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file entityspine-0.3.3.tar.gz.
File metadata
- Download URL: entityspine-0.3.3.tar.gz
- Upload date:
- Size: 152.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1fd5c6420f833592dff957724a77e68a7c425bc9c056432a2b0451ad00b9514
|
|
| MD5 |
b24be9ebff3ff222552191c4b87418e6
|
|
| BLAKE2b-256 |
bf2861ee08b7055d2424f4af622cb12ac41ed80161ef5db2661c94092cf57d2e
|
File details
Details for the file entityspine-0.3.3-py3-none-any.whl.
File metadata
- Download URL: entityspine-0.3.3-py3-none-any.whl
- Upload date:
- Size: 138.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
928a431f483368e1008ff4e602a5b3ef7dfb33a548afb813dc707624a9458b3e
|
|
| MD5 |
ee84c403dc2c89d06a229822bcc8e76f
|
|
| BLAKE2b-256 |
d43fd9d7690c1158bfdd8d63fd5ef2f0697e3f056ee5860e8c1c074f06952c62
|