Skip to main content

Structural Data Extractor using LLMs

Project description

sdeul

Structural Data Extractor using LLMs

CI/CD

Installation

$ pip install -U sdeul

Usage

Command Line Interface

  1. Create a JSON Schema file for the output

  2. Prepare a local model GGUF file or model API key.

  3. Extract structural data from given text using sdeul extract.

    Example:

    # Use OpenAI API
    $ sdeul extract --openai-model='gpt-4.1' \
        test/data/medication_history.schema.json \
        test/data/patient_medication_record.txt
    
    # Use Amazon Bedrock API
    $ sdeul extract --bedrock-model='us.anthropic.claude-sonnet-4-20250514-v1:0' \
        test/data/medication_history.schema.json \
        test/data/patient_medication_record.txt
    
    # Use Ollama API
    $ sdeul extract --ollama-model='gemma3:27b' \
        test/data/medication_history.schema.json \
        test/data/patient_medication_record.txt
    
    # Use a Llama.cpp GGUF model file
    $ sdeul extract --llamacpp-model-file='local_llm.gguf' \
        test/data/medication_history.schema.json \
        test/data/patient_medication_record.txt
    

    Expected output:

    {
      "MedicationHistory": [
        {
          "MedicationName": "Lisinopril",
          "Dosage": "10mg daily",
          "Frequency": "daily",
          "Purpose": "hypertension"
        },
        {
          "MedicationName": "Metformin",
          "Dosage": "500mg twice daily",
          "Frequency": "twice daily",
          "Purpose": "type 2 diabetes"
        },
        {
          "MedicationName": "Atorvastatin",
          "Dosage": "20mg at bedtime",
          "Frequency": "at bedtime",
          "Purpose": "high cholesterol"
        }
      ]
    }
    

REST API

SDEUL also provides a REST API for extracting structured data and validating JSON.

  1. Start the API server:

    $ sdeul serve
    
  2. The API will be available at http://localhost:8000 with the following endpoints:

    • POST /extract - Extract structured data from text
    • POST /validate - Validate JSON data against a schema
    • GET /health - Health check endpoint
    • GET /docs - Interactive API documentation
  3. Example API usage:

    # Extract data using OpenAI
    $ curl -X POST "http://localhost:8000/extract" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Patient is taking Lisinopril 10mg daily for hypertension.",
        "json_schema": {
          "type": "object",
          "properties": {
            "medications": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "dosage": {"type": "string"},
                  "condition": {"type": "string"}
                }
              }
            }
          }
        },
        "openai_model": "gpt-4o-mini",
        "openai_api_key": "your-api-key"
      }'
    
    # Validate JSON data
    $ curl -X POST "http://localhost:8000/validate" \
      -H "Content-Type: application/json" \
      -d '{
        "data": {"medications": [{"name": "Lisinopril", "dosage": "10mg", "condition": "hypertension"}]},
        "json_schema": {
          "type": "object",
          "properties": {
            "medications": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "dosage": {"type": "string"},
                  "condition": {"type": "string"}
                }
              }
            }
          }
        }
      }'
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdeul-0.2.0.tar.gz (208.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdeul-0.2.0-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file sdeul-0.2.0.tar.gz.

File metadata

  • Download URL: sdeul-0.2.0.tar.gz
  • Upload date:
  • Size: 208.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sdeul-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6918e0a6fcb90f2380a5d8fdca070497b4e058be94871db03813240b1ffd5aba
MD5 dd0731e9e4e7a749e127928b25284ea2
BLAKE2b-256 c1d890ce11bd4c732a4efc6d68b7afa250525f476176497043e7170881af77ba

See more details on using hashes here.

File details

Details for the file sdeul-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sdeul-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sdeul-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d5d240809f77bfc831b6932fd9b5d0004d24559699e7fd308537440b89bdaeb
MD5 f2519dedc1d5d1f11fb0dbd9e430789c
BLAKE2b-256 05235fcd6ac7f8bf6274e4d2222f8ef9c7abdd257d1be4cdf1e71de2137bfa11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page