Skip to main content

Ontology / Entity Resolution

Project description

EntityIdentity

Entity resolution and identity matching for companies.

Fast, in-memory company name resolution using fuzzy matching and smart normalization. No server required.

Installation

pip install entityidentity

Quick Start

from entityidentity import resolve_company, match_company

# Simple matching - returns best match or None
match = match_company("Apple Inc", country="US")
if match:
    print(f"Matched: {match['name']}")
    print(f"Country: {match['country']}")
    print(f"LEI: {match.get('lei', 'N/A')}")

# Full resolution with details
result = resolve_company("BHP Group", country="AU")
print(result['final'])      # Best match
print(result['decision'])   # How it was matched
print(result['matches'])    # All top matches with scores

Features

  • Fast in-memory lookups: <100ms for most queries
  • Multiple data sources: GLEIF LEI, Wikidata, stock exchanges
  • Smart normalization: Handles legal suffixes, punctuation, unicode
  • Fuzzy matching: RapidFuzz scoring with intelligent blocking
  • No dependencies: Works out of the box

Basic Usage

Normalize Company Names

from entityidentity import normalize_name

# Normalize for matching
normalized = normalize_name("Apple Inc.")
# Returns: "apple"

normalized = normalize_name("BHP Group Ltd")
# Returns: "bhp group"

Match Company Names

from entityidentity import match_company

# Find best match
match = match_company("Microsoft Corporation", country="US")
if match:
    print(f"Matched to: {match['name']}")
    print(f"Confidence: {match['score']}")

Resolve with Details

from entityidentity import resolve_company

# Get full resolution details
result = resolve_company("Tesla", country="US")

# Access matched company
company = result['final']
print(f"Name: {company['name']}")
print(f"Country: {company['country']}")

# See decision type
print(f"Decision: {result['decision']}")
# Examples: 'auto_high_conf', 'llm_tiebreak', 'low_confidence'

# Review all matches
for match in result['matches']:
    print(f"  {match['name']} - Score: {match['score']}")

Data Sources

The package includes pre-built company data from:

  • GLEIF LEI: Global Legal Entity Identifier database
  • Wikidata: Rich company metadata and aliases
  • Stock Exchanges: ASX, LSE, TSX listings

Sample data is included in the package for immediate use.

API Reference

match_company(name, country=None)

Simple interface to find the best matching company.

Parameters:

  • name (str): Company name to match
  • country (str, optional): ISO 2-letter country code

Returns: Dictionary with matched company data, or None if no good match found.

resolve_company(name, country=None, **kwargs)

Full resolution with all details and match scores.

Parameters:

  • name (str): Company name to resolve
  • country (str, optional): ISO 2-letter country code
  • Additional kwargs for advanced options

Returns: Dictionary with:

  • final: Best matched company
  • decision: Decision type ('auto_high_conf', 'llm_tiebreak', etc.)
  • matches: List of all potential matches with scores

normalize_name(name)

Normalize a company name for matching.

Parameters:

  • name (str): Company name to normalize

Returns: Normalized string (lowercase, no punctuation, legal suffixes removed)

list_companies(country=None, search=None, limit=None, data_path=None)

List companies with optional filtering.

Parameters:

  • country (str, optional): ISO 2-letter country code filter
  • search (str, optional): Search term for company names
  • limit (int, optional): Maximum number of results
  • data_path (str, optional): Path to custom data file

Returns: pandas DataFrame with filtered company data

Examples:

# List all US companies
us = list_companies(country="US")

# Search for mining companies
mining = list_companies(search="mining")

# Top 10 Australian companies
top_au = list_companies(country="AU", limit=10)

load_companies(data_path=None)

Load full company database into memory.

Parameters:

  • data_path (str, optional): Path to custom data file

Returns: pandas DataFrame with all company data

Performance

  • Query speed: <100ms for most lookups
  • Database size: ~10-50MB (compressed Parquet format)
  • Memory usage: ~200-500MB when loaded

Advanced Usage

Use Custom Data

from entityidentity import load_companies, match_company

# Load your own company data
df = load_companies("path/to/your/companies.parquet")

# Then use normally
match = match_company("Company Name")

List Companies

from entityidentity import list_companies

# List all companies
all_companies = list_companies()

# List companies by country
us_companies = list_companies(country="US")
au_companies = list_companies(country="AU")

# Search for companies
mining = list_companies(search="mining")
tech = list_companies(search="tech")

# Combine filters
uk_tech = list_companies(country="GB", search="tech", limit=10)

# Access data
for _, company in uk_tech.iterrows():
    print(f"{company['name']} - {company['country']}")

Access Raw Data

from entityidentity import load_companies

# Get full DataFrame for advanced filtering
companies = load_companies()

# Custom filtering
filtered = companies[
    (companies['country'] == 'US') & 
    (companies['name_norm'].str.contains('tech'))
]

Support

  • Documentation: See MAINTENANCE.md for development details
  • Issues: Report bugs on GitHub
  • License: MIT

Author

Peter Cotton

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entityidentity-0.0.2.tar.gz (42.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entityidentity-0.0.2-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file entityidentity-0.0.2.tar.gz.

File metadata

  • Download URL: entityidentity-0.0.2.tar.gz
  • Upload date:
  • Size: 42.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for entityidentity-0.0.2.tar.gz
Algorithm Hash digest
SHA256 238d155f80162f5fa45e20d7aefa0d10516a4c5b01513c3346455925038e5c29
MD5 8bb4441319f83b5d0f63cf0087afe265
BLAKE2b-256 cddb8a9227fcf06028a8c832debaa853f32419e78c13c7c6018f1af22f8402ab

See more details on using hashes here.

File details

Details for the file entityidentity-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: entityidentity-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for entityidentity-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 454ee53e963483fc500b40d0005de76cf61cbc83db3d3f965b0e87570715f029
MD5 0ccfc54ff4a6e3575edf0a0f55f1b109
BLAKE2b-256 03f423b1f71c6101388c44db98574c98c42c83e7e63eb6c75ba2f937bb48ffad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page