Structural Data Extractor using LLMs
Project description
sdeul
Structural Data Extractor using LLMs
Installation
$ pip install -U sdeul
Usage
Command Line Interface
-
Create a JSON Schema file for the output
-
Prepare a local model GGUF file or model API key.
-
Extract structural data from given text using
sdeul extract.Example:
# Use OpenAI API $ sdeul extract --openai-model='gpt-4.1' \ test/data/medication_history.schema.json \ test/data/patient_medication_record.txt # Use Amazon Bedrock API $ sdeul extract --bedrock-model='us.anthropic.claude-sonnet-4-20250514-v1:0' \ test/data/medication_history.schema.json \ test/data/patient_medication_record.txt # Use Ollama API $ sdeul extract --ollama-model='gemma3:27b' \ test/data/medication_history.schema.json \ test/data/patient_medication_record.txt # Use a Llama.cpp GGUF model file $ sdeul extract --llamacpp-model-file='local_llm.gguf' \ test/data/medication_history.schema.json \ test/data/patient_medication_record.txt
Expected output:
{ "MedicationHistory": [ { "MedicationName": "Lisinopril", "Dosage": "10mg daily", "Frequency": "daily", "Purpose": "hypertension" }, { "MedicationName": "Metformin", "Dosage": "500mg twice daily", "Frequency": "twice daily", "Purpose": "type 2 diabetes" }, { "MedicationName": "Atorvastatin", "Dosage": "20mg at bedtime", "Frequency": "at bedtime", "Purpose": "high cholesterol" } ] }
REST API
SDEUL also provides a REST API for extracting structured data and validating JSON.
-
Start the API server:
$ sdeul serve
-
The API will be available at
http://localhost:8000with the following endpoints:POST /extract- Extract structured data from textPOST /validate- Validate JSON data against a schemaGET /health- Health check endpointGET /docs- Interactive API documentation
-
Example API usage:
# Extract data using OpenAI $ curl -X POST "http://localhost:8000/extract" \ -H "Content-Type: application/json" \ -d '{ "text": "Patient is taking Lisinopril 10mg daily for hypertension.", "json_schema": { "type": "object", "properties": { "medications": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "dosage": {"type": "string"}, "condition": {"type": "string"} } } } } }, "openai_model": "gpt-4o-mini", "openai_api_key": "your-api-key" }' # Validate JSON data $ curl -X POST "http://localhost:8000/validate" \ -H "Content-Type: application/json" \ -d '{ "data": {"medications": [{"name": "Lisinopril", "dosage": "10mg", "condition": "hypertension"}]}, "json_schema": { "type": "object", "properties": { "medications": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "dosage": {"type": "string"}, "condition": {"type": "string"} } } } } } }'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sdeul-0.2.0.tar.gz.
File metadata
- Download URL: sdeul-0.2.0.tar.gz
- Upload date:
- Size: 208.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6918e0a6fcb90f2380a5d8fdca070497b4e058be94871db03813240b1ffd5aba
|
|
| MD5 |
dd0731e9e4e7a749e127928b25284ea2
|
|
| BLAKE2b-256 |
c1d890ce11bd4c732a4efc6d68b7afa250525f476176497043e7170881af77ba
|
File details
Details for the file sdeul-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sdeul-0.2.0-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d5d240809f77bfc831b6932fd9b5d0004d24559699e7fd308537440b89bdaeb
|
|
| MD5 |
f2519dedc1d5d1f11fb0dbd9e430789c
|
|
| BLAKE2b-256 |
05235fcd6ac7f8bf6274e4d2222f8ef9c7abdd257d1be4cdf1e71de2137bfa11
|