Skip to main content

Generalist model for Relation Extraction (Extract any relation types from texts)

Project description

GLiREL : Generalist and Lightweight model for Zero-Shot Relation Extraction

GLiREL is a Relation Extraction model capable of classifying unseen relations given the entities within a text. This builds upon the excelent work done by Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois on the GLiNER library which enables efficient zero-shot Named Entity Recognition.


Installation

conda create -n glirel python=3.10 -y && conda activate glirel
cd GLiREL && pip install -e . && pip install -r requirements.txt

To run experiments

# few_rel
cd data
python process_few_rel.py
cd ..
# adjust config
python train.py --config config_few_rel.yaml --log_dir logs-few-rel --relation_extraction
# wiki_zsl
cd data
curl -L -o wiki_all.json 'https://drive.google.com/uc?export=download&id=1ELFGUIYDClmh9GrEHjFYoE_VI1t2a5nK'
python process_wiki_zsl.py
cd ..
# adjust config
python train.py --config config_wiki_zsl.yaml --log_dir logs-wiki-zsl --relation_extraction

Example training data

JSONL file:

{
  "ner": [
    [7, 8, "Q4914513", "Binsey"], 
    [11, 13, "Q19686", "River Thames"]
  ], 
  "relations": [
    {
      "head": {"mention": "Binsey", "position": [7, 8], "type": "Q4914513"}, 
      "tail": {"mention": "River Thames", "position": [11, 13], "type": "Q19686"}, 
      "relation_id": "P206", 
      "relation_text": "located in or next to body of water"
    }
  ], 
  "tokenized_text": ["The", "race", "took", "place", "between", "Godstow", "and", "Binsey", "along", "the", "Upper", "River", "Thames", "."]
},
{
  "ner": [
    [9, 11, "Q4386693", "Legislative Assembly"], 
    [1, 4, "Q1848835", "Parliament of Victoria"]
  ], 
  "relations": [
    {
      "head": {"mention": "Legislative Assembly", "position": [9, 11], "type": "Q4386693"}, 
      "tail": {"mention": "Parliament of Victoria", "position": [1, 4], "type": "Q1848835"}, 
      "relation_id": "P361", 
      "relation_text": "part of"
    }
  ], 
  "tokenized_text": ["The", "Parliament", "of", "Victoria", "consists", "of", "the", "lower", "house", "Legislative", "Assembly", ",", "the", "upper", "house", "Legislative", "Council", "and", "the", "Queen", "of", "Australia", "."]
}

Usage

Once you've downloaded the GLiREL library, you can import the GLiREL class. You can then load this model using GLiREL.from_pretrained and predict entities with predict_relations.

from glirel import GLiREL
import spacy

model = GLiREL.from_pretrained("jackboyla/glirel_base")

text = "Jack Dorsey's father, Tim Dorsey, is a licensed pilot. Jack met his wife Sarah Paulson in New York in 2003. They have one son, Edward."

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)

labels = ['country of origin', 'licensed to broadcast to', 'parent', 'followed by', 'located in or next to body of water', 'spouse', 'child']

tokens = [token.text for token in doc]

ner = [[ent.start, ent.end, ent.label_, ent.text] for ent in doc.ents]
print(f"Entities detected: {ner}")

relations = model.predict_relations(tokens, labels, threshold=0.01, ner=ner)

print('Number of relations:', len(relations))

sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
print("\nDescending Order by Score:")
for item in sorted_data_desc:
    print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | socre: {item['score']}")

Expected Output

Entities detected: [[0, 2, 'PERSON', 'Jack Dorsey'], [5, 7, 'PERSON', 'Tim Dorsey'], [13, 14, 'PERSON', 'Jack'], [17, 19, 'PERSON', 'Sarah Paulson'], [20, 22, 'GPE', 'New York'], [23, 24, 'DATE', '2003'], [27, 28, 'CARDINAL', 'one'], [30, 31, 'PERSON', 'Edward']]
Number of relations: 90

Descending Order by Score:
['Sarah', 'Paulson'] --> spouse --> ['New', 'York'] | score: 0.6608812212944031
['Sarah', 'Paulson'] --> spouse --> ['Jack', 'Dorsey'] | score: 0.6601175665855408
['Edward'] --> spouse --> ['New', 'York'] | score: 0.6493653655052185
['one'] --> spouse --> ['New', 'York'] | score: 0.6480509042739868
['Edward'] --> spouse --> ['Jack', 'Dorsey'] | score: 0.6474933624267578
...

Usage with spaCy (TBD)

You can also load GliREL into a regular spaCy NLP pipeline. Here's an example using a blank English pipeline, but you can use any spaCy model.


Expected Output


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glirel-0.1.1.tar.gz (31.9 kB view hashes)

Uploaded Source

Built Distribution

glirel-0.1.1-py3-none-any.whl (33.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page