A Python library for structured information extraction with LLMs.
Project description
Struct-IE
: Structured Information Extraction with Large Language Models
struct-ie
is a Python library for named entity extraction using a transformer-based model.
Installation
You can install the struct-ie
library from PyPI:
pip install struct_ie
To-Do List
- Implement batch prediction
- Implement a Trainer fot Instruction Tuning
- PrefixLM for Instruction Tuning
- Add RelationExtractor
- Add GraphExtractor
- Add JsonExtractor
Usage
You can try it on google colab:
Here's an example of how to use the EntityExtractor
:
1. Basic Usage
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": None
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Extract entities from the text
entities = extractor.extract_entities(text)
print(entities)
2. Usage with a Custom Prompt
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Custom prompt for entity extraction
prompt = "You are an expert on Named Entity Recognition. Extract entities from this text."
# Extract entities from the text using a custom prompt
entities = extractor.extract_entities(text, prompt=prompt)
print(entities)
3. Usage with Few-shot Examples
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Few-shot examples for improved entity extraction
demonstrations = [
{"input": "Lionel Messi won the Ballon d'Or 7 times.", "output": [("Lionel Messi", "Name"), ("Ballon d'Or", "Award")]}
]
# Extract entities from the text using few-shot examples
entities = extractor.extract_entities(text, few_shot_examples=demonstrations)
print(entities)
License
This project is licensed under the Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
struct_ie-0.0.2.tar.gz
(8.1 kB
view details)
Built Distribution
File details
Details for the file struct_ie-0.0.2.tar.gz
.
File metadata
- Download URL: struct_ie-0.0.2.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 156c24c128b88b4c7c047bbfb668465ec3cd3ee25b9d78be9b29d66741e24633 |
|
MD5 | 7a3414ace856a44651ccb4cc476fdf25 |
|
BLAKE2b-256 | 4214f3afd3999a978fe7296dd8571d13ba18135f8e2a26d006f58aaf2944455c |
File details
Details for the file struct_ie-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: struct_ie-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80a7a43c37f19871fa22478f27fe73ba5f6b1e41abb6150e2ada66b73da4fab6 |
|
MD5 | eacc83f843e4fea648e6a4a5cd2c5b62 |
|
BLAKE2b-256 | d29df99456e35982224b6bb06939ab52c34ba20177913836deadd9154e10292a |