A Python library for structured information extraction with LLMs.
Project description
Struct-IE: Structured Information Extraction with Large Language Models
struct-ie is a Python library for named entity extraction using a transformer-based model.
Installation
You can install the struct-ie library from PyPI:
pip install struct_ie
To-Do List
- Implement batch prediction
- Implement a Trainer fot Instruction Tuning
- PrefixLM for Instruction Tuning
- Add RelationExtractor
- Add GraphExtractor
- Add JsonExtractor
Usage
You can try it on google colab:
Here's an example of how to use the EntityExtractor:
1. Basic Usage
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": None
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Extract entities from the text
entities = extractor.extract_entities(text)
print(entities)
2. Usage with a Custom Prompt
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Custom prompt for entity extraction
prompt = "You are an expert on Named Entity Recognition. Extract entities from this text."
# Extract entities from the text using a custom prompt
entities = extractor.extract_entities(text, prompt=prompt)
print(entities)
3. Usage with Few-shot Examples
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Few-shot examples for improved entity extraction
demonstrations = [
{"input": "Lionel Messi won the Ballon d'Or 7 times.", "output": [("Lionel Messi", "Name"), ("Ballon d'Or", "Award")]}
]
# Extract entities from the text using few-shot examples
entities = extractor.extract_entities(text, few_shot_examples=demonstrations)
print(entities)
License
This project is licensed under the Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file struct_ie-0.0.2.tar.gz.
File metadata
- Download URL: struct_ie-0.0.2.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
156c24c128b88b4c7c047bbfb668465ec3cd3ee25b9d78be9b29d66741e24633
|
|
| MD5 |
7a3414ace856a44651ccb4cc476fdf25
|
|
| BLAKE2b-256 |
4214f3afd3999a978fe7296dd8571d13ba18135f8e2a26d006f58aaf2944455c
|
File details
Details for the file struct_ie-0.0.2-py3-none-any.whl.
File metadata
- Download URL: struct_ie-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80a7a43c37f19871fa22478f27fe73ba5f6b1e41abb6150e2ada66b73da4fab6
|
|
| MD5 |
eacc83f843e4fea648e6a4a5cd2c5b62
|
|
| BLAKE2b-256 |
d29df99456e35982224b6bb06939ab52c34ba20177913836deadd9154e10292a
|