Skip to main content

A Python client for the Entity Linker service - intelligent entity matching and linking

Project description

Entity Linker Client

PyPI version Python 3.8+ License: MIT

A Python client library for the Entity Linker service that provides intelligent entity matching and linking capabilities. This library enables developers to easily integrate powerful entity resolution features into their applications.

Features

  • Intelligent Entity Matching: Supports multiple matching strategies including lexical similarity, semantic similarity, and exact matching
  • Flexible Configuration: Customizable matching rules and thresholds
  • Batch Operations: Efficient batch processing for large datasets
  • Type Safety: Full type annotations for better development experience
  • Simple API: Clean and intuitive interface for easy integration

Installation

Install the package using pip:

pip install entity-linker-client

Quick Start

Here's a simple example to get you started:

from entity_linker_client import EntityLinker

# Initialize the client
linker = EntityLinker(
    base_url="http://localhost:6000",
    source_columns=["name", "address", "phone"],
    target_columns=["business_name", "business_address", "contact_number"]
)

# Add some entities
entities = [
    {
        "canonical_name": "Acme Corporation",
        "aliases": ["Acme Corp", "Acme Inc"],
        "metadata": {
            "address": "123 Business Ave",
            "phone": "+1-555-0123"
        }
    }
]

added_entities = linker.add_entities_batch(entities)
print(f"Added {len(added_entities)} entities")

# Link a new entity
entity_to_link = {
    "canonical_name": "Acme Corp",
    "aliases": [],
    "metadata": {
        "address": "123 Business Avenue",
        "phone": "+1-555-0123"
    }
}

result = linker.link_entity(entity_to_link)
if result.get("linked_entity_id"):
    print(f"Found match: {result['linked_entity_id']}")
else:
    print("No match found")

Configuration

Basic Configuration

The simplest way to create a linker is by providing source and target columns:

from entity_linker_client import EntityLinker

linker = EntityLinker(
    base_url="http://localhost:6000",
    source_columns=["name", "industry", "location"],
    target_columns=["company_name", "sector", "address"]
)

Advanced Configuration

For more control, you can provide a custom configuration:

from entity_linker_client import (
    EntityLinker, EntityLinkingConfig, OrCondition, 
    FieldCondition, MatchCondition, MatchType
)

# Create custom configuration
config = EntityLinkingConfig(
    quick_creation_config=OrCondition(
        conditions=[
            FieldCondition(
                field="canonical_name",
                condition=MatchCondition(
                    match_type=MatchType.LEXICAL_SIMILARITY,
                    threshold=80
                )
            )
        ]
    ),
    # ... other configurations
)

linker = EntityLinker(base_url="http://localhost:6000", config=config)

API Reference

EntityLinker Class

The main class for interacting with the Entity Linker service.

Methods

  • add_entity(entity_data): Add a single entity
  • add_entities_batch(entities_data): Add multiple entities in batch
  • get_entity(entity_id): Retrieve an entity by ID
  • modify_entity(entity_id, entity_data): Update an existing entity
  • delete_entity(entity_id): Delete an entity
  • link_entity(entity_data, add_entity=False): Find matching entities
  • link_entity_with_id(entity_id): Link using an existing entity ID
  • get_info(): Get linker information
  • update_config(config): Update linker configuration
  • delete_linker(): Delete the linker instance

Static Methods

  • list_available_linkers(base_url): List all available linkers
  • get_linker_info(linker_id, base_url): Get information about a specific linker
  • generate_config(initial_config, source_columns, target_columns, base_url): Generate configuration

Configuration Classes

EntityLinkingConfig

Main configuration class containing:

  • quick_creation_config: Configuration for quick entity creation
  • quick_linking_config: Configuration for quick entity linking
  • llm_linking_config: Configuration for LLM-based linking
  • llm_top_k: Number of top results for LLM linking

MatchType Enum

Available matching strategies:

  • STRICT_MATCH: Exact string matching
  • LEXICAL_SIMILARITY: Token-based similarity
  • SEMANTIC_SIMILARITY: Embedding-based similarity
  • DICT_MATCH: Dictionary field matching

Examples

Working with Entities

# Add a single entity
entity = {
    "canonical_name": "OpenAI Inc",
    "aliases": ["OpenAI", "OpenAI LP"],
    "metadata": {
        "industry": "AI Research",
        "founded": "2015"
    }
}

added_entity = linker.add_entity(entity)
entity_id = added_entity["id"]

# Modify the entity
updated_data = {
    "canonical_name": "OpenAI Inc",
    "aliases": ["OpenAI", "OpenAI LP", "OpenAI L.P."],
    "metadata": {
        "industry": "Artificial Intelligence",
        "founded": "2015",
        "headquarters": "San Francisco"
    }
}

modified_entity = linker.modify_entity(entity_id, updated_data)

Batch Operations

# Add multiple entities at once
companies = [
    {
        "canonical_name": "Google LLC",
        "aliases": ["Google", "Alphabet Inc"],
        "metadata": {"industry": "Technology"}
    },
    {
        "canonical_name": "Microsoft Corporation", 
        "aliases": ["Microsoft", "MSFT"],
        "metadata": {"industry": "Technology"}
    }
]

batch_result = linker.add_entities_batch(companies)
print(f"Added {len(batch_result)} companies")

Entity Linking

# Try to link a potentially matching entity
candidate = {
    "canonical_name": "Alphabet",
    "aliases": ["Google Inc"],
    "metadata": {"industry": "Tech"}
}

# Link without adding to database
link_result = linker.link_entity(candidate, add_entity=False)

if link_result.get("linked_entity_id"):
    print(f"Found existing entity: {link_result['linked_entity_id']}")
else:
    # Add as new entity if no match found
    link_result = linker.link_entity(candidate, add_entity=True)
    print(f"Created new entity: {link_result.get('linked_entity_id', 'Failed')}")

Environment Variables

You can configure the client using environment variables:

export ENTITY_LINKER_BASE_URL="http://your-entity-linker-service:6000"

Error Handling

The client includes proper error handling for common scenarios:

from entity_linker_client import EntityLinker
import httpx

try:
    linker = EntityLinker(base_url="http://localhost:6000")
    entity = linker.get_entity("non-existent-id")
except httpx.HTTPStatusError as e:
    print(f"HTTP error: {e.response.status_code}")
except httpx.RequestError as e:
    print(f"Request error: {e}")
except ValueError as e:
    print(f"Configuration error: {e}")

Requirements

  • Python 3.8 or higher
  • httpx >= 0.24.0
  • python-dotenv >= 0.19.0

Development

To contribute to this project:

  1. Clone the repository
  2. Install development dependencies: pip install -e .[dev]
  3. Run tests: pytest
  4. Format code: black .
  5. Check types: mypy .

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions:

Changelog

See CHANGELOG.md for a detailed history of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entity_linker_client-1.0.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entity_linker_client-1.0.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file entity_linker_client-1.0.1.tar.gz.

File metadata

  • Download URL: entity_linker_client-1.0.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for entity_linker_client-1.0.1.tar.gz
Algorithm Hash digest
SHA256 3a3a4494c4759dcfbb37bf38855531440a505fa0798ea6e75ce9aa4f263c1c1b
MD5 488e720e769573c027a78074489be11b
BLAKE2b-256 2db3bfc5a3da9a6b1a2edf7641b44127e946d04ffb2ac1e72b305791f50d037d

See more details on using hashes here.

File details

Details for the file entity_linker_client-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for entity_linker_client-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3e8e365c839274009832fd7b180d3fede236cf986834a327354e50f31cd8844
MD5 85fadb17b8b324f53e52fa4f5cc7957e
BLAKE2b-256 1c5bf75087b69311cc7f38b7390b4dfb400f878fbc003e73bc59274219f5732f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page