Skip to main content

Lightweight entity resolution - from company_tickers.json to enterprise

Project description

EntitySpine

EntitySpine

Zero-Dependency Entity Resolution for SEC EDGAR Data

From company_tickers.json to enterprise-grade Knowledge Graph — without forcing dependencies.

PyPI Python Versions License CI Coverage

Quick StartWhy EntitySpine?ExamplesArchitectureAPIContributing


🎯 What is EntitySpine?

EntitySpine solves the entity resolution problem for SEC EDGAR data:

"Is CIK 0000320193 the same company as ticker AAPL on NASDAQ?"

It provides:

  • 🔍 Entity Resolution — Resolve tickers, CIKs, CUSIPs to canonical entities
  • 📊 Knowledge Graph — Model companies, people, relationships, events
  • 🗃️ Tiered Storage — JSON → SQLite → DuckDB → PostgreSQL
  • ⚡ Zero Core Dependencies — stdlib-only for Tier 0-1

⚡ Quick Start

Installation

# Core (zero dependencies)
pip install entityspine

# With optional features
pip install "entityspine[pydantic]"  # Validation wrappers
pip install "entityspine[orm]"       # SQLModel/SQLAlchemy
pip install "entityspine[duckdb]"    # Analytics tier
pip install "entityspine[full]"      # Everything

30-Second Example

from entityspine import SqliteStore

# Create store and load ~14,000 SEC companies (auto-downloads)
store = SqliteStore(":memory:")
store.initialize()
store.load_sec_data()  # Fetches from SEC automatically

# Resolve by ticker
results = store.search_entities("AAPL")
entity, score = results[0]
print(f"{entity.primary_name} (CIK: {entity.source_id})")
# Apple Inc. (CIK: 0000320193)

# Resolve by CIK  
entities = store.get_entities_by_cik("0000320193")
print(entities[0].primary_name)
# Apple Inc.

🤔 Why EntitySpine?

The Problem

SEC EDGAR data uses multiple identifiers that don't naturally connect:

Identifier Example What It Identifies
CIK 0000320193 Legal filing entity
Ticker AAPL Exchange listing
CUSIP 037833100 Security instrument
LEI HWUPKR0MPOU8FGXBT394 Global legal entity

Questions that are hard to answer:

  • Is AAPL and 0000320193 the same company? ✅ Yes
  • Did FB become META? How do I track that? 🤔
  • Which company is GOOG vs GOOGL? Same entity, different securities
  • What's the LEI for CIK 0001018724? 🤷

The Solution

EntitySpine provides a canonical entity model with identifier claims:

┌─────────────────────────────────────────────────────────────────┐
│                         Entity                                   │
│                    "Apple Inc."                                  │
│                                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │ CIK Claim   │  │ LEI Claim   │  │ EIN Claim   │             │
│  │ 0000320193  │  │ HWUPKR...   │  │ 94-2404110  │             │
│  │ source: SEC │  │ source:GLEIF│  │ source: IRS │             │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
│                          │                                       │
│                          ▼                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                      Security                            │   │
│  │               "Apple Common Stock"                       │   │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐        │   │
│  │  │CUSIP Claim │  │ISIN Claim  │  │FIGI Claim  │        │   │
│  │  │ 037833100  │  │US037833...  │  │BBG000B9XRY4│        │   │
│  │  └────────────┘  └────────────┘  └────────────┘        │   │
│  │                          │                              │   │
│  │                          ▼                              │   │
│  │  ┌────────────────────────────────────────────────┐   │   │
│  │  │              Listing (NASDAQ)                   │   │   │
│  │  │  Ticker: AAPL | MIC: XNAS | Status: ACTIVE     │   │   │
│  │  └────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

📚 Examples

Example 1: Load SEC Company Tickers

"""Load SEC's company_tickers.json into EntitySpine."""
from entityspine import SqliteStore
import httpx
import json

# Download SEC data
url = "https://www.sec.gov/files/company_tickers.json"
headers = {"User-Agent": "MyApp/1.0 (contact@example.com)"}
response = httpx.get(url, headers=headers)
sec_data = response.json()

# Load into EntitySpine
store = SqliteStore("./sec_entities.db")
store.initialize()
loaded = store.load_sec_json(sec_data)

print(f"Loaded {loaded} entities")
# Loaded 10,847 entities

# Query
results = store.search_entities("Microsoft")
for entity, score in results[:5]:
    print(f"  {score:.2f} | {entity.primary_name}")
# 1.00 | Microsoft Corporation
# 0.70 | Microsoft Payments Inc

Example 2: Build Knowledge Graph from Filing

"""Extract entities and relationships from a 10-K filing."""
from entityspine import (
    Entity, EntityType, EntityStatus,
    IdentifierClaim, IdentifierScheme, VendorNamespace, ClaimStatus,
    Relationship, NodeRef, NodeKind, RelationshipType,
    SqliteStore,
)

store = SqliteStore("./knowledge_graph.db")
store.initialize()

# Create NVIDIA entity
nvidia = Entity(
    primary_name="NVIDIA Corporation",
    entity_type=EntityType.ORGANIZATION,
    status=EntityStatus.ACTIVE,
    jurisdiction="DE",
    sic_code="3674",
    source_system="sec-edgar",
    source_id="0001045810",
)
store.save_entity(nvidia)

# Attach CIK with SEC filing evidence
cik_claim = IdentifierClaim(
    entity_id=nvidia.entity_id,
    scheme=IdentifierScheme.CIK,
    value="0001045810",
    namespace=VendorNamespace.SEC,
    source="sec-edgar",
    source_ref="0001045810-24-000029",  # Accession number
    confidence=1.0,
)
store.save_claim(cik_claim)

# Create supplier entity
tsmc = Entity(
    primary_name="Taiwan Semiconductor Manufacturing Company",
    entity_type=EntityType.ORGANIZATION,
    source_system="sec-edgar",
)
store.save_entity(tsmc)

# Create supplier relationship with evidence
relationship = Relationship(
    source_ref=NodeRef(NodeKind.ENTITY, nvidia.entity_id),
    target_ref=NodeRef(NodeKind.ENTITY, tsmc.entity_id),
    relationship_type=RelationshipType.SUPPLIER,
    confidence=0.95,
    evidence_filing_id="0001045810-24-000029",
    evidence_snippet="TSMC manufactures substantially all of our GPUs...",
    source_system="sec-edgar",
)
store.save_relationship(relationship)

print(f"Entities: {store.entity_count()}")
print(f"Relationships: {store.relationship_count()}")

Example 3: py-sec-edgar Integration

"""Ingest SEC filings using the integration module."""
from datetime import date
from entityspine.integration import (
    FilingFacts,
    FilingEvidence,
    ingest_filing_facts,
)
from entityspine.integration.contracts import (
    ExtractedEntity,
    ExtractedRelationship,
)
from entityspine import SqliteStore

store = SqliteStore("./filings.db")
store.initialize()

# Build facts from a 10-K (this would come from py-sec-edgar)
facts = FilingFacts(
    evidence=FilingEvidence(
        accession_number="0001045810-24-000029",
        form_type="10-K",
        filed_date=date(2024, 2, 21),
        cik="0001045810",
    ),
    registrant_name="NVIDIA Corporation",
    registrant_cik="0001045810",
    registrant_ticker="NVDA",
    registrant_exchange="NASDAQ",
    registrant_sic="3674",
    registrant_state="DE",
    entities=[
        ExtractedEntity(name="Jensen Huang", entity_type="person"),
        ExtractedEntity(name="TSMC", entity_type="organization"),
        ExtractedEntity(name="Microsoft", entity_type="organization"),
    ],
    relationships=[
        ExtractedRelationship(
            source_name="NVIDIA Corporation",
            target_name="TSMC",
            relationship_type="SUPPLIER",
            evidence_snippet="TSMC manufactures our GPUs",
        ),
        ExtractedRelationship(
            source_name="NVIDIA Corporation", 
            target_name="Microsoft",
            relationship_type="CUSTOMER",
            evidence_snippet="Microsoft is a major customer for datacenter",
        ),
    ],
)

# Ingest into knowledge graph
result = ingest_filing_facts(store, facts)

print(f"Created {result.entities_created} entities")
print(f"Created {result.relationships_created} relationships")
print(f"Created {result.claims_created} identifier claims")
# Created 4 entities
# Created 2 relationships  
# Created 2 identifier claims

Example 4: Multi-Identifier Resolution

"""Resolve entities across multiple identifier schemes."""
from entityspine import SqliteStore

store = SqliteStore("./entities.db")
store.initialize()

# Search works across CIK, ticker, and name
queries = ["AAPL", "0000320193", "Apple Inc", "Apple"]

for query in queries:
    results = store.search_entities(query, limit=1)
    if results:
        entity, score = results[0]
        print(f"'{query}' → {entity.primary_name} (score: {score:.2f})")
        
# 'AAPL' → Apple Inc. (score: 1.00)
# '0000320193' → Apple Inc. (score: 1.00)
# 'Apple Inc' → Apple Inc. (score: 1.00)
# 'Apple' → Apple Inc. (score: 0.70)

Example 5: Track Corporate Actions

"""Track mergers, name changes, and ticker changes."""
from entityspine import Entity, EntityStatus, SqliteStore
from entityspine.domain.timestamps import utc_now

store = SqliteStore("./corporate_actions.db")
store.initialize()

# Original entity
facebook = Entity(
    primary_name="Facebook, Inc.",
    entity_type=EntityType.ORGANIZATION,
    source_system="sec-edgar",
    source_id="0001326801",
)
store.save_entity(facebook)

# After rebranding - create redirect
meta = Entity(
    primary_name="Meta Platforms, Inc.",
    entity_type=EntityType.ORGANIZATION,
    source_system="sec-edgar",
    source_id="0001326801",  # Same CIK
)
store.save_entity(meta)

# Mark old entity as merged
facebook_merged = facebook.with_update(
    status=EntityStatus.MERGED,
    redirect_to=meta.entity_id,
    redirect_reason="Rebranded to Meta Platforms, Inc.",
    merged_at=utc_now(),
)
store.save_entity(facebook_merged)

# Lookups automatically follow redirect
results = store.search_entities("Facebook")
entity, _ = results[0]
print(f"Facebook resolved to: {entity.primary_name}")
# Facebook resolved to: Meta Platforms, Inc.

🏗️ Architecture

Domain is Canonical

┌─────────────────────────────────────────────────────────────────┐
│                    entityspine.domain                           │
│                  (stdlib dataclasses only)                      │
│  ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌─────────────────┐      │
│  │ Entity  │ │ Security │ │ Listing │ │IdentifierClaim  │      │
│  └─────────┘ └──────────┘ └─────────┘ └─────────────────┘      │
│  ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌─────────────────┐      │
│  │  Asset  │ │ Contract │ │ Product │ │  Relationship   │      │
│  └─────────┘ └──────────┘ └─────────┘ └─────────────────┘      │
└─────────────────────────────────────────────────────────────────┘
                          ↑ returns domain dataclasses
┌─────────────────────────────────────────────────────────────────┐
│                      entityspine.stores                         │
│  ┌───────────────┐ ┌───────────────┐                           │
│  │  JsonStore    │ │  SqliteStore  │  (Tier 0-1, stdlib)       │
│  └───────────────┘ └───────────────┘                           │
└─────────────────────────────────────────────────────────────────┘
                          ↑ optional extras
┌─────────────────────────────────────────────────────────────────┐
│                    entityspine.adapters                         │
│  ┌─────────────────────┐ ┌─────────────────────┐               │
│  │ pydantic/ wrappers  │ │  orm/ SqlModelStore │               │
│  │ to_domain/from_dom  │ │  returns domain     │               │
│  └─────────────────────┘ └─────────────────────┘               │
│  pip install .[pydantic]  pip install .[orm]                    │
└─────────────────────────────────────────────────────────────────┘

Storage Tiers

Tier Backend Dependencies Use Case Temporal?
0 JSON file None Scripts, CLI
1 SQLite None Local dev
2 DuckDB [duckdb] Analytics ⏳ Planned
3 PostgreSQL [postgres] Production

Tier Honesty

Lower tiers warn when they can't fulfill advanced queries:

result = store.resolve("AAPL", as_of="2015-01-01")
if not result.as_of_honored:
    for warning in result.warnings:
        print(f"⚠️ {warning}")
# ⚠️ as_of parameter ignored: temporal resolution requires Tier 2+

📖 Domain Models

Core Models

Model Purpose Key Fields
Entity Legal/organizational identity primary_name, entity_type, jurisdiction
Security Tradeable instrument security_type, entity_id, description
Listing Exchange-specific ticker ticker, exchange, mic, security_id
IdentifierClaim Identifier with provenance scheme, value, entity_id, confidence

Knowledge Graph Nodes

Model Purpose
Person Natural persons (executives, directors)
Asset Physical/tangible assets
Contract Material agreements
Product Products/services
Brand Brand identities
Event Discrete business events
Case Legal proceedings
Geo Geographic locations
Address Physical addresses

Edge Models

Model Purpose
RoleAssignment Person→Org roles (CEO, CFO, Director)
Relationship Generic node→node edges with evidence
EntityRelationship Entity→Entity relationships

🔧 Configuration

Environment Variables

# Storage path (default: ./entityspine.db)
ENTITYSPINE_DB_PATH=./data/entities.db

# Log level
ENTITYSPINE_LOG_LEVEL=INFO

Programmatic Configuration

from entityspine import SqliteStore

# In-memory for testing
store = SqliteStore(":memory:")

# File-based
store = SqliteStore("./entities.db")

# With explicit WAL mode (better concurrency)
store = SqliteStore("./entities.db", wal_mode=True)

🧪 Testing

# Run all tests
pytest

# With coverage
pytest --cov=entityspine --cov-report=html

# Only unit tests
pytest tests/unit -v

# Only integration tests  
pytest tests/integration -v

📁 Project Structure

entityspine/
├── src/entityspine/
│   ├── __init__.py              # Package exports
│   ├── domain/                  # Canonical stdlib dataclasses
│   │   ├── entity.py            # Entity model
│   │   ├── security.py          # Security model
│   │   ├── listing.py           # Listing model
│   │   ├── claim.py             # IdentifierClaim model
│   │   ├── graph.py             # KG nodes (Asset, Contract, etc.)
│   │   ├── enums.py             # All enumerations
│   │   └── validators.py        # Normalization + validation
│   ├── stores/                  # Storage backends
│   │   ├── sqlite_store.py      # Tier 1 (stdlib sqlite3)
│   │   ├── json_store.py        # Tier 0 (JSON file)
│   │   └── mappers.py           # Domain ↔ dict conversion
│   ├── adapters/                # Optional adapters
│   │   ├── pydantic/            # Pydantic validation wrappers
│   │   └── orm/                 # SQLModel/SQLAlchemy layer
│   ├── integration/             # py-sec-edgar integration
│   │   ├── contracts.py         # FilingFacts schema
│   │   ├── ingest.py            # Ingestion functions
│   │   └── normalize.py         # SEC identifier normalizers
│   └── core/                    # Utilities
│       ├── ulid.py              # ULID generation
│       └── timestamps.py        # UTC timestamp utilities
├── tests/                       # 303 tests
├── examples/                    # Usage examples
└── docs/                        # Documentation

🚀 Roadmap

  • v0.3.x — Core entity resolution, KG nodes, integration module
  • v0.4.x — DuckDB Tier 2, temporal queries
  • v0.5.x — PostgreSQL Tier 3, full temporal support
  • v0.6.x — FastAPI service, graph traversal API
  • v1.0.0 — Production-ready, comprehensive documentation

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Setup development environment
git clone https://github.com/ryansmccoy/entity-spine.git
cd entity-spine
pip install -e ".[dev]"
pre-commit install

# Run tests
pytest

# Run linting
ruff check src tests
mypy src/entityspine

📄 License

MIT License — see LICENSE for details.


🔗 Links


Built with ❤️ for the SEC EDGAR community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entityspine-0.3.3.tar.gz (152.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entityspine-0.3.3-py3-none-any.whl (138.5 kB view details)

Uploaded Python 3

File details

Details for the file entityspine-0.3.3.tar.gz.

File metadata

  • Download URL: entityspine-0.3.3.tar.gz
  • Upload date:
  • Size: 152.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for entityspine-0.3.3.tar.gz
Algorithm Hash digest
SHA256 f1fd5c6420f833592dff957724a77e68a7c425bc9c056432a2b0451ad00b9514
MD5 b24be9ebff3ff222552191c4b87418e6
BLAKE2b-256 bf2861ee08b7055d2424f4af622cb12ac41ed80161ef5db2661c94092cf57d2e

See more details on using hashes here.

File details

Details for the file entityspine-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: entityspine-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 138.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for entityspine-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 928a431f483368e1008ff4e602a5b3ef7dfb33a548afb813dc707624a9458b3e
MD5 ee84c403dc2c89d06a229822bcc8e76f
BLAKE2b-256 d43fd9d7690c1158bfdd8d63fd5ef2f0697e3f056ee5860e8c1c074f06952c62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page