Use LLMs to label any textual dataset
Project description
LLMTag
A simple interface to label clinical data using local large language models (LLMs)
Features
- Base code of getting started with LLMs
- Try LLM's without PHI/PII issues and GPUs
- Simple interface for several tasks
- Label Documents
- Fine-tune (upcoming)
- RAG (upcoming)
- Be in control
- Try any of the latest models
- Control Context Length
- Tailor to your specific need
Getting Started (Contributors)
- Clone the llmtag repo
- Download weights - any llama2 compatible model should work
- Get llama-7B-chat weights from HF
- Save it under ./models/7B
- Add the model path to the environment variable
MODEL- create .env file in root directory of repository (e.g. touch .env)
- copy and paste below or define your own path to the model binary (actual model weights)
.envMODEL=./models/7B/llama-2-7b-chat.Q4_K_M.gguf
- Initialize the environment with
poetry install(if new to poetry, please check this) - For leveraging GPUs, please check - llama-cpp-python
- Follow the instructions for installation as per your machine specifications
- For simple CPU use case, can resort to not using GPU, but will be very time intensive, orders of magnitude more
- Ignore this step for now unless you know better
Post installation
- Run all tests using
poetry run python -m pytest tests/ - Run the default example:
poetry run python -m llmtag
Results
Raw Clinical notes
| patient_id | notes | label | |
|---|---|---|---|
| 0 | 1 | Patient complains of leg pain and swelling. Ultrasound confirms DVT. | 1 |
| 1 | 2 | Patient experiences chest pain and shortness of breath. CT scan confirms PE. | 1 |
| 2 | 3 | Patient has a history of DVT. No current symptoms noted. | 0 |
| 3 | 4 | No complaints or symptoms related to VTE or PE. | 0 |
LLM labeled notes
| patient_id | notes | label | llm_label | llm_reasons | |
|---|---|---|---|---|---|
| 0 | 1 | Patient complains of leg pain and swelling. Ultrasound confirms DVT. | 1 | 1 | Ultrasound confirms DVT |
| 1 | 2 | Patient experiences chest pain and shortness of breath. CT scan confirms PE. | 1 | 1 | CT scan confirms PE |
| 2 | 3 | Patient has a history of DVT. No current symptoms noted. | 0 | 0 | No Symptoms found |
| 3 | 4 | No complaints or symptoms related to VTE or PE. | 0 | 0 | No evidence of VTE or PE |
Metrics on simulated data
Confusion Matrix
n = 50
| predicted | |||
|---|---|---|---|
| Actual | 0 | 1 | |
| 0 | 27 | 1 | |
| 1 | 0 | 20 |
| Metric | |
|---|---|
| F1 Score | 0.98 |
| Precision | 0.95 |
| Recall | 1.0 |
| Accuracy | 0.98 |
Libraries Used
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llmtag-0.1.0.tar.gz
(16.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
llmtag-0.1.0-py3-none-any.whl
(18.8 kB
view details)
File details
Details for the file llmtag-0.1.0.tar.gz.
File metadata
- Download URL: llmtag-0.1.0.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.11.11 Linux/6.8.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
027ef0bb584d6bc8e623374bf4f44703f7c4080c32911dc1d5b78838e70475e5
|
|
| MD5 |
af30f686bd89cb28af8e857f9fdb09f9
|
|
| BLAKE2b-256 |
8561bcce9e945e8128c9db969f12fc5159a21939e7c173c0f6337a3d5d35e05b
|
File details
Details for the file llmtag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmtag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.11.11 Linux/6.8.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8411c884599f0e35dcdc52a8c8d744e70f98799ecee6bc5599559e0516045579
|
|
| MD5 |
7be128e82063afbbecf413985fb818bf
|
|
| BLAKE2b-256 |
a714a1eeb22b95fb0acb1540d665271ae74bc1e1706a394089280c5c3dc6ef10
|