Tabular-Infused Parameter Efficient Finetuning (tipeft)
Project description
tipeft
Tabular-infused Parameter Efficient Finetuning (tipeft) is a novel PEFT method designed to infuse tabular features into the initialization process of re-parameterization parameter efficient finetuning (PEFT) methods.
It is specifically designed for postoperative predictions in clinical care, where predictive and valuable pre-operative tabular features are often under-utilized in language model finetuning. For now, it supports both LoRA and IA3.
Requirements
Dependencies
The following Python packages are required for tipeft:
torch
transformers
peft
accelerate
numpy
pandas
scikit-learn
tqdm
Install dependencies with:
pip install torch transformers peft accelerate numpy pandas scikit-learn tqdm
Note on PyTorch installation
Because PyTorch wheels vary by CUDA version and hardware, it is recommended to install PyTorch manually following the instructions at:
System Requirements
tipeft has been tested and verified on the following configuration:
+===========+================+
Important Notes
Environment: Must be run in a Jupyter notebook. Running as a standalone Python script may cause multiprocessing issues.
CPU cores: At least 10 CPU cores recommended (uses Pool(processes=10) internally).
GPU: CUDA-compatible GPU required.
OS: Tested on Windows. Linux/Mac compatibility not yet verified.
Known Compatibility Limitations
Jupyter only - Uses tqdm.notebook which may not display correctly outside Jupyter.
Multiprocessing - May behave differently on Linux/Mac due to different multiprocessing backends.
GPU Requirements
tipeft is designed for GPU acceleration.
At least 1 GPU is recommended
Suggested minimum: 16GB VRAM
Memory usage depends on:
sequence length
model size
batch size
peft configuration
Installation
To install in python, simply do the following:
pip install tipeft
Usage
train_tabular_infused_IA3
Trains a tabular-infused IA3 model for binary classification.
from tipeft import train_tabular_infused_IA3
model, tokenizer = train_tabular_infused_IA3(
train=train_df,
val=val_df,
pretrained_model_name="emilyalsentzer/Bio_ClinicalBERT",
label_col="in_hospital_mortality",
text_col="clinical_notes",
columns_unique_labels_of_tabular_features={
"gender": 2,
"insurance": 3,
"marital_status": 4,
"anchor_age": 1,
"anchor_year": 1
},
lr=0.001,
num_epochs=5,
lr_of_tabular_infused_features=0.0001
)
Parameters
+===========================================+===================+==========================================================================================+
Returns
+============+==============+===========================+
Notes
The label_col must contain boolean values (True/False)
Categorical features should have >1 unique labels
Continuous features should use 1 in the mapping dictionary
Ensure all categorical values appear in both train and val sets
The trained model is saved to trained_models/IA3_{pretrained_model_name}_{label_col}
Questions?
Contact me at alba@wustl.edu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tipeft-0.0.5.tar.gz.
File metadata
- Download URL: tipeft-0.0.5.tar.gz
- Upload date:
- Size: 236.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42762cb251b8199809e53c401dcb731ea32e178bcabc28045bf21c374184659b
|
|
| MD5 |
b5e5bb07d2683688e2acd45b2ec362a5
|
|
| BLAKE2b-256 |
0f327eda82566b0d9b17de7884e95315c5a177d1373e9cb53e9048f2132681c6
|
File details
Details for the file tipeft-0.0.5-py3-none-any.whl.
File metadata
- Download URL: tipeft-0.0.5-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
940ea055fc9422017cacbce90348b017e6f6c6fdf7546fdeb22685fc36b6a13d
|
|
| MD5 |
f942b628f43ab586300e1fa00a9687f7
|
|
| BLAKE2b-256 |
a4867d4189c4e83c9a66412eb40e3064ce0e72ff4ba3dbdc5d80970989927534
|