A Unified Library for Entity Linking
Project description
entity-linkings
entity-linkings is an unified library for entity linking.
Instllation
# from PyPi (ToDO)
pip install entity-linkings
# from the source
git clone git@github.com:YuSawan/entity_linkings.git
cd entity_linkings
pip install .
# for uv users
git clone git@github.com:YuSawan/entity_linkings.git
cd entity_linkings
uv sync
Quick Start
entity-linkigs provides two interfaces: command-line interface (CLI) and Python API.
CLI
Command-line interface can train/evalate/run Entity Linkings system from command-line.
To create EL system, you must build candidate retriever with entitylinkings-train_retrieval.
In this example, e5bm25 can be executed with custom dataset.
entitylinkings-train-retrieval \
--retriever_id e5bm25 \
--train_file train.jsonl \
--validation_file validation.jsonl \
--dictionary_id_or_path dictionary.jsonl \
--output_dir save_model/ \
--num_hard_negatives 4 \
--num_train_epochs 10 \
--train_batch_size 8 \
--validation_batch_size 16 \
--config config.yaml \
--wandb
Next, Entity Disambiguation (ED) and End-to-End Entity Linking (EL) systems can trained with entitylinkings-train.
This example is the FEVRY with custom candidate retriever.
entitylinkings-train \
--model_type ed \
--model_id fevry \
--model_name_or_path google-bert/bert-base-uncased \
--retriever_id e5bm25 \
--retriever_model_name_or_path save_model/ \
--dictionary_id_or_path dictionary.jsonl \
--train_file train.jsonl \
--validation_file validation.jsonl \
--num_candidates 30 \
--num_train_epochs 2 \
--train_batch_size 8 \
--validation_batch_size 16 \
--output_dir save_fevry/ \
--config config.yaml \
--wandb
Finally, you can evaluate Retriever or EL systems with entitylinkings-eval or entitylinkings-eval-retrieval, respectively.
entitylinkings-eval-retrieval \
--retriever_id <retriever_id> \
--model_name_or_path save_model/ \
--dictionary_id_or_path dictionary.jsonl \
--test_file test.jsonl \
--config config.yaml \
--output_dir result/ \
--test_batch_size 256 \
--wandb
entitylinkings-eval \
--model_type ed \
--model_id fevry \
--model_name_or_path save_fevry/ \
--retriever_id e5bm25 \
--retriever_model_name_or_path save_model/ \
--dictionary_id_or_path dictionary.jsonl \
--test_file test.jsonl \
--config config.yaml \
--output_dir result/ \
--test_batch_size 256 \
--wandb
You can change the arguments (e.g., context length) using configuration file.
The config.yaml with default values can be generated via entitylinkings-gen-config.
entitylinkings-gen-config
Python API
This is the exemple of ChatEL with Zelda Candidate list via API.
Valids IDs for get_retrievers and get_models() can be found with get_retriever_ids and get_model_ids() respectively.
from entity_linkings import get_retrievers, get_models, load_dictionary
# Load Dictionary from dictionary_id or local path
dictionary = load_dictionary('zelda')
# Load Candidate Retriever
retriever_cls = get_retrievers('zeldacl')
retriever = retriever_cls(
dictionary,
config=retriever_cls.Config()
)
# Setup ED or EL models
model_cls = get_models('chatel')
model = model_cls(
task='ed'
retriever=retriever,
config=model_cls.Config("gpt-4o")
)
# Prediction
sentences = "NAIST is in Ikoma."
spans = [(0, 5)]
predictions = model.predict(sentence, spans, top_k=1)
print("ID: ", predictions[0][0]["id"])
print("Title: ", predictions[0][0]["prediction"])
print("Score: ", predictions[0][0]["score"])
Available Models
Candidate Retriever
- BM25
- ZELDA Candidate List (Milich and Akbik., 2023)
- Dual Encoder Model
- Text Embedding Model
- E5+BM25 (Nakatani et al., 2025)
Entity Disambiguation
- FEVRY (Févry et al.,2020)
- BLINK (Wu et al., 2020)
- ExtEnD: (Barba et al., 2022)
- ReFinED (Ayoola et al., 2022)
- FusionED: (Wang et al., 2024)
- ChatEL (Ding et al., 2024)
Entity Dictionary
Available Dictionaries
| dictionary_id | Dataset | Language | Domain |
|---|---|---|---|
kilt_wiki |
KILT (Petroni et al., 2021) | English | Wikipedia |
zelda_wiki |
ZELDA (Milich and Akbik., 2023) | English | Wikipedia |
zeshel_wikia |
ZeshEL (Logeswaran et al., 2021) | English | Wikia |
- Please obtain the source data for the entity dictionary from the following link.
kilt_wiki: kilt-knowledgesource.jsonzelda_wiki: zelda_labels_verbalizations.jsonzeshel_wikia: zeshel.tar.bz2
- If you place the data in entity_linkings/entity_dictionary/
dictionary_id/,load_dictionary(<dictionary_id>). will automatically convert the data. - We plan to support downloading these dictionaries directly via libraries such as HuggingFace Datasets.
Custom Entity Dictionary
If you want to use our packages with your custom ontologies, you need to convert to the following format:
{
"id": "000011",
"name": "NAIST",
"description": "NAIST is located in Ikoma."
}
Datasets
Public datasets
| dataset_id | Dataset | Domain | Language | Ontology | Train | Licence |
|---|---|---|---|---|---|---|
msnbc |
MSNBC (Cucerzan, 2007) | News | English | Wikipedia | Unknown* | |
aquaint |
AQUAINT (Milne and Witten, 2008) | News | English | Wikipedia | Unknown* | |
ace2004 |
ACE2004 (Ratinov et al, 2011) | News | English | Wikipedia | Unknown* | |
kore50 |
KORE50 (Hoffart et al., 2012) | News | English | Wikipedia | CC BY-SA 3.0 | |
n3-r128 |
N3-Reuters-128 (R̈oder et al., 2014) | News | English | Wikipedia | GNU AGPL-3.0 | |
n3-r500 |
N3-RSS-500 (R̈oder et al., 2014) | RSS | English | Wikipedia | GNU AGPL-3.0 | |
derczynski |
Derczynski (Derczynski et al., 2015) | English | Wikipedia | CC-BY 4.0 | ||
oke-2015 |
OKE-2015 (Nuzzolese et al., 2015) | News | English | Wikipedia | Unknown* | |
oke-2016 |
OKE-2016 (Nuzzolese et al., 2015) | News | English | Wikipedia | Unknown* | |
wned-wiki |
WNED-WIKI (Guo and Barbosa, 2018) | Wikipedia | English | Wikipedia | Unknown | |
wned-cweb |
WNED-CWEB (Guo and Barbosa, 2018) | Web | English | Wikipedia | Apache License 2.0 | |
unseen |
WikilinksNED Unseen-Mentions (Onoe and Durrett, 2020) | News | English | Wikipedia | ✅ | CC-BY 3.0* |
tweeki |
Tweeki EL (Harandizadeh and Singh, 2020) | English | Wikipedia | Apache License 2.0 | ||
reddit-comments |
Reddit EL (Botzer et al., 2021) | English | Wikipedia | CC-BY 4.0 | ||
reddit-posts |
Reddit EL (Botzer et al., 2021) | English | Wikipedia | CC-BY 4.0 | ||
shadowlink-shadow |
ShadowLink (Provatorova et al., 2021) | Wikipedia | English | Wikipedia | Unknown* | |
shadowlink-top |
ShadowLink (Provatorova et al., 2021) | Wikipedia | English | Wikipedia | Unknown* | |
shadowlink-tail |
ShadowLink (Provatorova et al., 2021) | Wikipedia | English | Wikipedia | Unknown* | |
zeshel |
Zeshel (Logeswaran et al., 2021) | Wikia | English | Wikia | ✅ | CC-BY-SA |
docred |
Linked-DocRED (Genest et al., 2023) | News | English | Wikipedia | ✅ | CC-BY 4.0 |
- Original MSNBC (Cucerzan, 2007) is not available due to expiration of the official link. You can download the dataset at GERBIL official code.
- ShadownLink, OKE-{2015,2016} are uncertain to publicly use, but they are provided at official repositories.
- WikilinksNED Unseen-Mentions is created by splitting the WikilinksNED. The WikilinksNED is derived from the Wikilinks corpus, which is made available under CC-BY 3.0.
- The folowing datasests is not publicly available or uncertain. If you want to evaluate these resource, please register the LDC and convert these dataset to our format.
- AIDA CoNLL-YAGO (Hoffart et al., 2011): You must sign the agreement to use Reuter Corpus
- TACKBP-2010 (Ji et al., 2011): You must sign Text Analysis Conference (TAC) Knowledge Base Population Evaluation License Agreement.
Custom Dataset
If you want to use our packages with the your private dataset, you must convert it to the following format:
{
"id": "doc-001-P1",
"text": "She graduated from NAIST.",
"entities": [{"start": 19, "end": 24, "label": ["000011"]}],
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file entity_linkings-0.1.0.tar.gz.
File metadata
- Download URL: entity_linkings-0.1.0.tar.gz
- Upload date:
- Size: 71.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7e09f563988324e0f00c5713fbf6220c39df6ed432d6dfa3ccfc950f413edf8
|
|
| MD5 |
3f24853f5b95eab9f279f6508978b809
|
|
| BLAKE2b-256 |
2756884f0bde1887fb414098a33762137b11741a499894f850599f0e3853e533
|
File details
Details for the file entity_linkings-0.1.0-py3-none-any.whl.
File metadata
- Download URL: entity_linkings-0.1.0-py3-none-any.whl
- Upload date:
- Size: 123.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3896fbf08297db5038c016df9115421e4b48b7f1fe3bcb29e7f47e427be6f775
|
|
| MD5 |
9dc3714c20af0a7fa848a65822b48ba9
|
|
| BLAKE2b-256 |
3e98c3d3d1c09f4dbfe22dd06f563995169bbd90e76b2abe7bbb524a02ba20be
|