Skip to main content

A Python client for the Entity Linker service - intelligent entity matching and linking

Project description

Entity Linker Client

PyPI version Python 3.8+ License: MIT

A Python client library for the Entity Linker service that provides intelligent entity matching and linking capabilities. This library enables developers to easily integrate powerful entity resolution features into their applications.

Features

  • Intelligent Entity Matching: Supports multiple matching strategies including lexical similarity, semantic similarity, and exact matching
  • Flexible Configuration: Customizable matching rules and thresholds
  • Batch Operations: Efficient batch processing for large datasets
  • Type Safety: Full type annotations for better development experience
  • Simple API: Clean and intuitive interface for easy integration

Installation

Install the package using pip:

pip install entity-linker-client

Quick Start

Here's a simple example to get you started:

from entity_linker_client import EntityLinker

# Initialize the client
linker = EntityLinker(
    base_url="http://localhost:6000",
    source_columns=["name", "address", "phone"],
    target_columns=["business_name", "business_address", "contact_number"]
)

# Add some entities
entities = [
    {
        "canonical_name": "Acme Corporation",
        "aliases": ["Acme Corp", "Acme Inc"],
        "metadata": {
            "address": "123 Business Ave",
            "phone": "+1-555-0123"
        }
    }
]

added_entities = linker.add_entities_batch(entities)
print(f"Added {len(added_entities)} entities")

# Link a new entity
entity_to_link = {
    "canonical_name": "Acme Corp",
    "aliases": [],
    "metadata": {
        "address": "123 Business Avenue",
        "phone": "+1-555-0123"
    }
}

result = linker.link_entity(entity_to_link)
if result.get("linked_entity_id"):
    print(f"Found match: {result['linked_entity_id']}")
else:
    print("No match found")

Configuration

Basic Configuration

The simplest way to create a linker is by providing source and target columns:

from entity_linker_client import EntityLinker

linker = EntityLinker(
    base_url="http://localhost:6000",
    source_columns=["name", "industry", "location"],
    target_columns=["company_name", "sector", "address"]
)

Advanced Configuration

For more control, you can provide a custom configuration:

from entity_linker_client import (
    EntityLinker, EntityLinkingConfig, OrCondition, 
    FieldCondition, MatchCondition, MatchType
)

# Create custom configuration
config = EntityLinkingConfig(
    quick_creation_config=OrCondition(
        conditions=[
            FieldCondition(
                field="canonical_name",
                condition=MatchCondition(
                    match_type=MatchType.LEXICAL_SIMILARITY,
                    threshold=80
                )
            )
        ]
    ),
    # ... other configurations
)

linker = EntityLinker(base_url="http://localhost:6000", config=config)

API Reference

EntityLinker Class

The main class for interacting with the Entity Linker service.

Methods

  • add_entity(entity_data): Add a single entity
  • add_entities_batch(entities_data): Add multiple entities in batch
  • get_entity(entity_id): Retrieve an entity by ID
  • modify_entity(entity_id, entity_data): Update an existing entity
  • delete_entity(entity_id): Delete an entity
  • link_entity(entity_data, add_entity=False): Find matching entities
  • link_entity_with_id(entity_id): Link using an existing entity ID
  • get_info(): Get linker information
  • update_config(config): Update linker configuration
  • delete_linker(): Delete the linker instance

Static Methods

  • list_available_linkers(base_url): List all available linkers
  • get_linker_info(linker_id, base_url): Get information about a specific linker
  • generate_config(initial_config, source_columns, target_columns, base_url): Generate configuration

Configuration Classes

EntityLinkingConfig

Main configuration class containing:

  • quick_creation_config: Configuration for quick entity creation
  • quick_linking_config: Configuration for quick entity linking
  • llm_linking_config: Configuration for LLM-based linking
  • llm_top_k: Number of top results for LLM linking

MatchType Enum

Available matching strategies:

  • STRICT_MATCH: Exact string matching
  • LEXICAL_SIMILARITY: Token-based similarity
  • SEMANTIC_SIMILARITY: Embedding-based similarity
  • DICT_MATCH: Dictionary field matching

Examples

Working with Entities

# Add a single entity
entity = {
    "canonical_name": "OpenAI Inc",
    "aliases": ["OpenAI", "OpenAI LP"],
    "metadata": {
        "industry": "AI Research",
        "founded": "2015"
    }
}

added_entity = linker.add_entity(entity)
entity_id = added_entity["id"]

# Modify the entity
updated_data = {
    "canonical_name": "OpenAI Inc",
    "aliases": ["OpenAI", "OpenAI LP", "OpenAI L.P."],
    "metadata": {
        "industry": "Artificial Intelligence",
        "founded": "2015",
        "headquarters": "San Francisco"
    }
}

modified_entity = linker.modify_entity(entity_id, updated_data)

Batch Operations

# Add multiple entities at once
companies = [
    {
        "canonical_name": "Google LLC",
        "aliases": ["Google", "Alphabet Inc"],
        "metadata": {"industry": "Technology"}
    },
    {
        "canonical_name": "Microsoft Corporation", 
        "aliases": ["Microsoft", "MSFT"],
        "metadata": {"industry": "Technology"}
    }
]

batch_result = linker.add_entities_batch(companies)
print(f"Added {len(batch_result)} companies")

Entity Linking

# Try to link a potentially matching entity
candidate = {
    "canonical_name": "Alphabet",
    "aliases": ["Google Inc"],
    "metadata": {"industry": "Tech"}
}

# Link without adding to database
link_result = linker.link_entity(candidate, add_entity=False)

if link_result.get("linked_entity_id"):
    print(f"Found existing entity: {link_result['linked_entity_id']}")
else:
    # Add as new entity if no match found
    link_result = linker.link_entity(candidate, add_entity=True)
    print(f"Created new entity: {link_result.get('linked_entity_id', 'Failed')}")

Environment Variables

You can configure the client using environment variables:

export ENTITY_LINKER_BASE_URL="http://your-entity-linker-service:6000"

Error Handling

The client includes proper error handling for common scenarios:

from entity_linker_client import EntityLinker
import httpx

try:
    linker = EntityLinker(base_url="http://localhost:6000")
    entity = linker.get_entity("non-existent-id")
except httpx.HTTPStatusError as e:
    print(f"HTTP error: {e.response.status_code}")
except httpx.RequestError as e:
    print(f"Request error: {e}")
except ValueError as e:
    print(f"Configuration error: {e}")

Requirements

  • Python 3.8 or higher
  • httpx >= 0.24.0
  • python-dotenv >= 0.19.0

Development

To contribute to this project:

  1. Clone the repository
  2. Install development dependencies: pip install -e .[dev]
  3. Run tests: pytest
  4. Format code: black .
  5. Check types: mypy .

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions:

Changelog

See CHANGELOG.md for a detailed history of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entity_linker_client-1.2.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entity_linker_client-1.2.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file entity_linker_client-1.2.0.tar.gz.

File metadata

  • Download URL: entity_linker_client-1.2.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for entity_linker_client-1.2.0.tar.gz
Algorithm Hash digest
SHA256 451effd75325a4774cf633097413677b5a7dbbcb585edc6313c50857693d7702
MD5 b93532bd6ca7f6d1705f4766d803b1cb
BLAKE2b-256 b95c6ac2307283d75ad7156f54ec56f834bf6cdf85c681908ed345a1d77a610e

See more details on using hashes here.

File details

Details for the file entity_linker_client-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for entity_linker_client-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c28c80e77b6449d2ca3ee891d818cb61955777dd79f1336c638d5554a29edeaf
MD5 660c400dfceff0b3ac5a8436f6c7bb16
BLAKE2b-256 c2f4101a69760628a078e7f678b424226f569d2866f3bd4d543c2c7b5a3aefe2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page