A library to create synthetic data with OpenAI and train a GLiNER model on that data.
Project description
GLiNER-Finetune
gliner-finetune
is a Python library designed to generate synthetic data using OpenAI's GPT models, process this data, and then use it to train a GLiNER model. GLiNER is a framework for learning and inference in Named Entity Recognition (NER) tasks.
Features
- Data Generation: Leverage OpenAI's powerful language models to create synthetic training data.
- Data Processing: Convert raw synthetic data into a format suitable for NER training.
- Model Training: Fine-tune the GLiNER model on the processed synthetic data for improved NER performance.
Installation
To install the gliner-finetune
library, use pip:
pip install gliner-finetune
Quick Start
The following example demonstrates how to generate synthetic data, process it, and train a GLiNER model using the gliner-finetune
library.
Make sure you have a .env file with your OPENAI_API_KEY set as a variable.
Step 1: Generate Synthetic Data
from gliner_finetune.synthetic import generate_data, create_prompt
import json
# Define your example data
example_data = {
"text": "The Alpine Swift primarily consumes flying insects such as wasps, bees, and flies. It captures its prey mid-air while swiftly flying through the alpine skies. It nests in high, rocky mountain crevices where it uses feathers and small sticks to construct a simple yet secure nesting environment.",
"generic_plant_food": [],
"generic_animal_food": ["flying insects"],
"plant_food": [],
"specific_animal_food": ["wasps", "bees", "flies"],
"location_nest": ["rocky mountain crevices"],
"item_nest": ["feathers", "small sticks"]
}
# Convert example data to JSON string
json_data = json.dumps(example_data)
# Generate prompt and synthetic data
prompt = create_prompt(json_data)
print(prompt)
# Generate synthetic data with specified number of API calls
num_calls = 3
results = generate_data(json_data, num_calls)
print(results)
Step 2: Process and Split Data
from gliner_finetune.convert import convert
# Assuming the data has been read from 'parsed_responses.json'
with open('synthetic_data/parsed_responses.json', 'r') as file:
data = json.load(file)
# Flatten the data list for processing
final_data = [sample for item in data for sample in item]
# Convert and split the data into training, validation, and testing datasets
training_data = convert(final_data, project_path='', train_split=0.8, eval_split=0.2, test_split=0.0,
train_file='train.json', eval_file='eval.json', test_file='test.json', overwrite=True)
Step 3: Train the GLiNER Model
from gliner_finetune.train import train_model
# Train the model
train_model(model="urchade/gliner_small-v2.1", train_data="assets/train.json",
eval_data="assets/eval.json", project="")
Documentation
For more details about the GLiNER model and its capabilities, visit the official repository:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gliner-finetune-0.0.4.tar.gz
.
File metadata
- Download URL: gliner-finetune-0.0.4.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84e3f092bcd2db8a0d8f8d612d88d7b8d12907b50132c1bdd65b9c382a98a18c |
|
MD5 | 042b398c313218d233f7ff6e4e11b4d5 |
|
BLAKE2b-256 | c7cc462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8 |
File details
Details for the file gliner_finetune-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: gliner_finetune-0.0.4-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bf8d67286efa030da09706eab6feeb378a55a6f100d19302ac50b50cbf16acd |
|
MD5 | 9dfdeccf6dc806773406d9360e8fddf4 |
|
BLAKE2b-256 | 03c2b6ab5dc4a8a3812e81f7f2b15bebf18e5c1bb2958c0a039d0ccc8306c39e |