Skip to main content

A Python library for structured information extraction with LLMs.

Project description

Struct-IE: Structured Information Extraction with Large Language Models

struct-ie is a Python library for named entity extraction using a transformer-based model.

Installation

You can install the struct-ie library from PyPI:

pip install struct_ie

To-Do List

  • Implement batch prediction
  • Implement a Trainer fot Instruction Tuning
  • PrefixLM for Instruction Tuning
  • Add RelationExtractor
  • Add GraphExtractor
  • Add JsonExtractor

Usage

You can try it on google colab:

Here's an example of how to use the EntityExtractor:

1. Basic Usage

from struct_ie import EntityExtractor

# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
    "Name": "Names of individuals like 'Jane Doe'",
    "Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
    "Date": None,
    "Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
    "Team": None
}

# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")

# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."

# Extract entities from the text
entities = extractor.extract_entities(text)
print(entities)

2. Usage with a Custom Prompt

from struct_ie import EntityExtractor

# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
    "Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
    "Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
    "Date": None,
    "Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
    "Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}

# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")

# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."

# Custom prompt for entity extraction
prompt = "You are an expert on Named Entity Recognition. Extract entities from this text."

# Extract entities from the text using a custom prompt
entities = extractor.extract_entities(text, prompt=prompt)
print(entities)

3. Usage with Few-shot Examples

from struct_ie import EntityExtractor

# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
    "Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
    "Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
    "Date": None,
    "Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
    "Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}

# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")

# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."

# Few-shot examples for improved entity extraction
demonstrations = [
    {"input": "Lionel Messi won the Ballon d'Or 7 times.", "output": [("Lionel Messi", "Name"), ("Ballon d'Or", "Award")]}
]

# Extract entities from the text using few-shot examples
entities = extractor.extract_entities(text, few_shot_examples=demonstrations)
print(entities)

License

This project is licensed under the Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

struct_ie-0.0.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

struct_ie-0.0.2-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file struct_ie-0.0.2.tar.gz.

File metadata

  • Download URL: struct_ie-0.0.2.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for struct_ie-0.0.2.tar.gz
Algorithm Hash digest
SHA256 156c24c128b88b4c7c047bbfb668465ec3cd3ee25b9d78be9b29d66741e24633
MD5 7a3414ace856a44651ccb4cc476fdf25
BLAKE2b-256 4214f3afd3999a978fe7296dd8571d13ba18135f8e2a26d006f58aaf2944455c

See more details on using hashes here.

File details

Details for the file struct_ie-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: struct_ie-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for struct_ie-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 80a7a43c37f19871fa22478f27fe73ba5f6b1e41abb6150e2ada66b73da4fab6
MD5 eacc83f843e4fea648e6a4a5cd2c5b62
BLAKE2b-256 d29df99456e35982224b6bb06939ab52c34ba20177913836deadd9154e10292a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page