Skip to main content

Use LLMs to label any textual dataset

Project description

LLMTag

A simple interface to label clinical data using local large language models (LLMs)

Features

  1. Base code of getting started with LLMs
  2. Try LLM's without PHI/PII issues and GPUs
  3. Simple interface for several tasks
    • Label Documents
    • Fine-tune (upcoming)
    • RAG (upcoming)
  4. Be in control
    1. Try any of the latest models
    2. Control Context Length
    3. Tailor to your specific need

Getting Started (Contributors)

  1. Clone the llmtag repo
  2. Download weights - any llama2 compatible model should work
  3. Add the model path to the environment variable MODEL
    • create .env file in root directory of repository (e.g. touch .env)
    • copy and paste below or define your own path to the model binary (actual model weights)
    • .env
      MODEL=./models/7B/llama-2-7b-chat.Q4_K_M.gguf
      
  4. Initialize the environment with poetry install (if new to poetry, please check this)
  5. For leveraging GPUs, please check - llama-cpp-python
    • Follow the instructions for installation as per your machine specifications
    • For simple CPU use case, can resort to not using GPU, but will be very time intensive, orders of magnitude more
    • Ignore this step for now unless you know better

Post installation

  1. Run all tests using poetry run python -m pytest tests/
  2. Run the default example: poetry run python -m llmtag

Results

Raw Clinical notes

patient_id notes label
0 1 Patient complains of leg pain and swelling. Ultrasound confirms DVT. 1
1 2 Patient experiences chest pain and shortness of breath. CT scan confirms PE. 1
2 3 Patient has a history of DVT. No current symptoms noted. 0
3 4 No complaints or symptoms related to VTE or PE. 0

LLM labeled notes

patient_id notes label llm_label llm_reasons
0 1 Patient complains of leg pain and swelling. Ultrasound confirms DVT. 1 1 Ultrasound confirms DVT
1 2 Patient experiences chest pain and shortness of breath. CT scan confirms PE. 1 1 CT scan confirms PE
2 3 Patient has a history of DVT. No current symptoms noted. 0 0 No Symptoms found
3 4 No complaints or symptoms related to VTE or PE. 0 0 No evidence of VTE or PE

Metrics on simulated data

Confusion Matrix

n = 50

predicted
Actual 0 1
0 27 1
1 0 20
Metric
F1 Score 0.98
Precision 0.95
Recall 1.0
Accuracy 0.98

Libraries Used

  1. llama.cpp
  2. llama-cpp-python
  3. Model Weights
  4. poetry

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmtag-0.1.0.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmtag-0.1.0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file llmtag-0.1.0.tar.gz.

File metadata

  • Download URL: llmtag-0.1.0.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.11.11 Linux/6.8.0-1017-azure

File hashes

Hashes for llmtag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 027ef0bb584d6bc8e623374bf4f44703f7c4080c32911dc1d5b78838e70475e5
MD5 af30f686bd89cb28af8e857f9fdb09f9
BLAKE2b-256 8561bcce9e945e8128c9db969f12fc5159a21939e7c173c0f6337a3d5d35e05b

See more details on using hashes here.

File details

Details for the file llmtag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmtag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.11.11 Linux/6.8.0-1017-azure

File hashes

Hashes for llmtag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8411c884599f0e35dcdc52a8c8d744e70f98799ecee6bc5599559e0516045579
MD5 7be128e82063afbbecf413985fb818bf
BLAKE2b-256 a714a1eeb22b95fb0acb1540d665271ae74bc1e1706a394089280c5c3dc6ef10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page