Skip to main content

spaCy ANN Linker, a pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking.

Project description

spaCy ANN Linker, a pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking based on an Approximate Nearest Neighbors (ANN) index computed on the Character N-Gram TF-IDF representation of all aliases in your KnowledgeBase.

Build Status Package version Code Coverage


Documentation: https://microsoft.github.io/spacy-ann-linker

Source Code: https://github.com/microsoft/spacy-ann-linker


spaCy ANN Linker is a spaCy a pipeline component for generating alias candidates for spaCy entities in doc.ents. It provides an optional interface for linking ambiguous aliases based on descriptions for each entity.

The key features are:

  • Easy spaCy Integration: spaCy ANN Linker provides completely serializable spaCy pipeline components that integrate directly into your existing spaCy model.

  • CLI for simple Index Creation: Simply run spacy_ann create_index with your data to create an Approximate Nearest Neighbors index from your data, make an ann_linker pipeline component and save a spaCy model.

  • Built in Web API for easy deployment and Batch Entity Linking queries

Requirements

Python 3.6+

spaCy ANN Linker is convenient wrapper built on a few comprehensive, high-performing packages.

Installation

$ pip install spacy-ann-linker
---> 100%
Successfully installed spacy-ann-linker

Data Prerequisites

To use this spaCy ANN Linker you need pre-existing Knowledge Base data. spaCy ANN Linker expects data to exist in 2 JSONL files together in a directory

kb_dir
│   aliases.jsonl
│   entities.jsonl

For testing the package, you can use the example data in examples/tutorial/data

examples/tutorial/data
│   aliases.jsonl
│   entities.jsonl

entities.jsonl Record Format

{"id": "Canonical Entity Id", "description": "Entity Description used for Disambiguation"}

Example data

{"id": "a1", "description": "Machine learning (ML) is the scientific study of algorithms and statistical models..."}
{"id": "a2", "description": "ML (\"Meta Language\") is a general-purpose functional programming language. It has roots in Lisp, and has been characterized as \"Lisp with types\"."}
{"id": "a3", "description": "Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data."}
{"id": "a4", "description": "Neuro-linguistic programming (NLP) is a pseudoscientific approach to communication, personal development, and psychotherapy created by Richard Bandler and John Grinder in California, United States in the 1970s."}
...

aliases.jsonl Record Format

{"alias": "alias string", "entities": ["list", "of", "entity", "ids"], "probabilities": [0.5, 0.5]}

Example data

{"alias": "ML", "entities": ["a1", "a2"], "probabilities": [0.5, 0.5]}
{"alias": "Machine learning", "entities": ["a1"], "probabilities": [1.0]}
{"alias": "Meta Language", "entities": ["a2"], "probabilities": [1.0]}
{"alias": "NLP", "entities": ["a3", "a4"], "probabilities": [0.5, 0.5]}
{"alias": "Natural language processing", "entities": ["a3"], "probabilities": [1.0]}
{"alias": "Neuro-linguistic programming", "entities": ["a4"], "probabilities": [1.0]}
...

Example Data

spacy-ann-linker comes with some example data to get you started.

!!! important If this is your first time using spacy-ann-linker start out with the example data using the spacy_ann example_data command. Just pass an output_dir to write the example data to.

$ spacy_ann example_data ./kb

=============== Example Data ================
Writing Example data to test/kb
✔ Done.

This should leave you with a folder called ./kb_dir that has a structure like

kb_dir
│   aliases.jsonl
│   entities.jsonl

spaCy prerequisites

If you don't have a pretrained spaCy model, download one now. The model needs to have vectors so download a model bigger than en_core_web_sm

$ spacy download en_core_web_md
---> 100%
Successfully installed en_core_web_md

Next Steps

Once you have the Data and spaCy prerequisites completed follow along with the Tutorial to for a step-by-step guide for using the spacy_ann package.

!!! important These are just the prerequisites. Follow the full tutorial linked above for a step-by-step guide to working with spacy-ann-linker.

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-ann-linker-0.3.3.tar.gz (210.3 kB view details)

Uploaded Source

Built Distribution

spacy_ann_linker-0.3.3-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file spacy-ann-linker-0.3.3.tar.gz.

File metadata

  • Download URL: spacy-ann-linker-0.3.3.tar.gz
  • Upload date:
  • Size: 210.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.0

File hashes

Hashes for spacy-ann-linker-0.3.3.tar.gz
Algorithm Hash digest
SHA256 01675825fcd8fbea3f44ffc0b7d3fdf90f12902052fbd98b9ded846346d896cb
MD5 53972435a0031b025751bccd7a66571c
BLAKE2b-256 fe457ab1228dd81daf58d900d42a28e052762e4ffc4d42520f0c691ae499a4e9

See more details on using hashes here.

File details

Details for the file spacy_ann_linker-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for spacy_ann_linker-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f4c29a2a988086e13ebebc69a1309b00f5f47522a45394a10d3abfd1670a9d17
MD5 d9966e808c31cd7a0ed04abcb8a09bb6
BLAKE2b-256 7340e94f279619c1b4b76ea9f4e4869367dc4fc90a01cd352e47532decfe3521

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page