Skip to main content

A Python client for the Entity Linker service - intelligent entity matching and linking

Project description

Entity Linker Client

PyPI version Python 3.8+ License: MIT

A Python client library for the Entity Linker service that provides intelligent entity matching and linking capabilities. This library enables developers to easily integrate powerful entity resolution features into their applications.

Features

  • Intelligent Entity Matching: Supports multiple matching strategies including lexical similarity, semantic similarity, and exact matching
  • Flexible Configuration: Customizable matching rules and thresholds
  • Batch Operations: Efficient batch processing for large datasets
  • Type Safety: Full type annotations for better development experience
  • Simple API: Clean and intuitive interface for easy integration

Installation

Install the package using pip:

pip install entity-linker-client

Quick Start

Here's a simple example to get you started:

from entity_linker_client import EntityLinker

# Initialize the client
linker = EntityLinker(
    base_url="http://localhost:6000",
    source_columns=["name", "address", "phone"],
    target_columns=["business_name", "business_address", "contact_number"]
)

# Add some entities
entities = [
    {
        "canonical_name": "Acme Corporation",
        "aliases": ["Acme Corp", "Acme Inc"],
        "metadata": {
            "address": "123 Business Ave",
            "phone": "+1-555-0123"
        }
    }
]

added_entities = linker.add_entities_batch(entities)
print(f"Added {len(added_entities)} entities")

# Link a new entity
entity_to_link = {
    "canonical_name": "Acme Corp",
    "aliases": [],
    "metadata": {
        "address": "123 Business Avenue",
        "phone": "+1-555-0123"
    }
}

result = linker.link_entity(entity_to_link)
if result.get("linked_entity_id"):
    print(f"Found match: {result['linked_entity_id']}")
else:
    print("No match found")

Configuration

Basic Configuration

The simplest way to create a linker is by providing source and target columns:

from entity_linker_client import EntityLinker

linker = EntityLinker(
    base_url="http://localhost:6000",
    source_columns=["name", "industry", "location"],
    target_columns=["company_name", "sector", "address"]
)

Advanced Configuration

For more control, you can provide a custom configuration:

from entity_linker_client import (
    EntityLinker, EntityLinkingConfig, OrCondition, 
    FieldCondition, MatchCondition, MatchType
)

# Create custom configuration
config = EntityLinkingConfig(
    quick_creation_config=OrCondition(
        conditions=[
            FieldCondition(
                field="canonical_name",
                condition=MatchCondition(
                    match_type=MatchType.LEXICAL_SIMILARITY,
                    threshold=80
                )
            )
        ]
    ),
    # ... other configurations
)

linker = EntityLinker(base_url="http://localhost:6000", config=config)

API Reference

EntityLinker Class

The main class for interacting with the Entity Linker service.

Methods

  • add_entity(entity_data): Add a single entity
  • add_entities_batch(entities_data): Add multiple entities in batch
  • get_entity(entity_id): Retrieve an entity by ID
  • modify_entity(entity_id, entity_data): Update an existing entity
  • delete_entity(entity_id): Delete an entity
  • link_entity(entity_data, add_entity=False): Find matching entities
  • link_entity_with_id(entity_id): Link using an existing entity ID
  • get_info(): Get linker information
  • update_config(config): Update linker configuration
  • delete_linker(): Delete the linker instance

Static Methods

  • list_available_linkers(base_url): List all available linkers
  • get_linker_info(linker_id, base_url): Get information about a specific linker
  • generate_config(initial_config, source_columns, target_columns, base_url): Generate configuration

Configuration Classes

EntityLinkingConfig

Main configuration class containing:

  • quick_creation_config: Configuration for quick entity creation
  • quick_linking_config: Configuration for quick entity linking
  • llm_linking_config: Configuration for LLM-based linking
  • llm_top_k: Number of top results for LLM linking

MatchType Enum

Available matching strategies:

  • STRICT_MATCH: Exact string matching
  • LEXICAL_SIMILARITY: Token-based similarity
  • SEMANTIC_SIMILARITY: Embedding-based similarity
  • DICT_MATCH: Dictionary field matching

Examples

Working with Entities

# Add a single entity
entity = {
    "canonical_name": "OpenAI Inc",
    "aliases": ["OpenAI", "OpenAI LP"],
    "metadata": {
        "industry": "AI Research",
        "founded": "2015"
    }
}

added_entity = linker.add_entity(entity)
entity_id = added_entity["id"]

# Modify the entity
updated_data = {
    "canonical_name": "OpenAI Inc",
    "aliases": ["OpenAI", "OpenAI LP", "OpenAI L.P."],
    "metadata": {
        "industry": "Artificial Intelligence",
        "founded": "2015",
        "headquarters": "San Francisco"
    }
}

modified_entity = linker.modify_entity(entity_id, updated_data)

Batch Operations

# Add multiple entities at once
companies = [
    {
        "canonical_name": "Google LLC",
        "aliases": ["Google", "Alphabet Inc"],
        "metadata": {"industry": "Technology"}
    },
    {
        "canonical_name": "Microsoft Corporation", 
        "aliases": ["Microsoft", "MSFT"],
        "metadata": {"industry": "Technology"}
    }
]

batch_result = linker.add_entities_batch(companies)
print(f"Added {len(batch_result)} companies")

Entity Linking

# Try to link a potentially matching entity
candidate = {
    "canonical_name": "Alphabet",
    "aliases": ["Google Inc"],
    "metadata": {"industry": "Tech"}
}

# Link without adding to database
link_result = linker.link_entity(candidate, add_entity=False)

if link_result.get("linked_entity_id"):
    print(f"Found existing entity: {link_result['linked_entity_id']}")
else:
    # Add as new entity if no match found
    link_result = linker.link_entity(candidate, add_entity=True)
    print(f"Created new entity: {link_result.get('linked_entity_id', 'Failed')}")

Environment Variables

You can configure the client using environment variables:

export ENTITY_LINKER_BASE_URL="http://your-entity-linker-service:6000"

Error Handling

The client includes proper error handling for common scenarios:

from entity_linker_client import EntityLinker
import httpx

try:
    linker = EntityLinker(base_url="http://localhost:6000")
    entity = linker.get_entity("non-existent-id")
except httpx.HTTPStatusError as e:
    print(f"HTTP error: {e.response.status_code}")
except httpx.RequestError as e:
    print(f"Request error: {e}")
except ValueError as e:
    print(f"Configuration error: {e}")

Requirements

  • Python 3.8 or higher
  • httpx >= 0.24.0
  • python-dotenv >= 0.19.0

Development

To contribute to this project:

  1. Clone the repository
  2. Install development dependencies: pip install -e .[dev]
  3. Run tests: pytest
  4. Format code: black .
  5. Check types: mypy .

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions:

Changelog

See CHANGELOG.md for a detailed history of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entity_linker_client-1.0.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entity_linker_client-1.0.0-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file entity_linker_client-1.0.0.tar.gz.

File metadata

  • Download URL: entity_linker_client-1.0.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for entity_linker_client-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f68f47645dc7d62984dc2dda0f1144df5fcbd8f2e733eef98099a7d4747bf082
MD5 a05313ae038575d2161a400bc46228b8
BLAKE2b-256 93015a5650cc5c45a383f633d344b4f11ab5b0bf45a10a9585fac2fbd43a716a

See more details on using hashes here.

File details

Details for the file entity_linker_client-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for entity_linker_client-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed35b4ffed98862a01c19684f2b6ed6e4429a57d277d6016b2e9c62ce8ed38ef
MD5 1dcf947a0cc0c45730941f98cf40d4c7
BLAKE2b-256 0b167bc0eca273f828c0920e1c236b0a5c289f8e0bdfa14f0e2a71ae0af66f00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page