Skip to main content

Tabular-Infused Parameter Efficient Finetuning (tipeft)

Project description

tipeft


Tabular-infused Parameter Efficient Finetuning (tipeft) is a novel PEFT method designed to infuse tabular features into the initialization process of re-parameterization parameter efficient finetuning (PEFT) methods.

https://raw.githubusercontent.com/cja5553/peft_postoperative_risk_prediction/main/Figure_1.jpg

It is specifically designed for postoperative predictions in clinical care, where predictive and valuable pre-operative tabular features are often under-utilized in language model finetuning. For now, it supports both LoRA and IA3.

Requirements


Dependencies


The following Python packages are required for tipeft:

  • torch

  • transformers

  • peft

  • accelerate

  • numpy

  • pandas

  • scikit-learn

  • tqdm

Install dependencies with:

pip install torch transformers peft accelerate numpy pandas scikit-learn tqdm

Note on PyTorch installation


Because PyTorch wheels vary by CUDA version and hardware, it is recommended to install PyTorch manually following the instructions at:

https://pytorch.org/

System Requirements


tipeft has been tested and verified on the following configuration:

Component | Tested Version |

+===========+================+

OS | Windows 10 |
Python | 3.9.19 |
CUDA | 12.6 |

Important Notes


  • Environment: Must be run in a Jupyter notebook. Running as a standalone Python script may cause multiprocessing issues.

  • CPU cores: At least 10 CPU cores recommended (uses Pool(processes=10) internally).

  • GPU: CUDA-compatible GPU required.

  • OS: Tested on Windows. Linux/Mac compatibility not yet verified.

Known Compatibility Limitations


  1. Jupyter only - Uses tqdm.notebook which may not display correctly outside Jupyter.

  2. Multiprocessing - May behave differently on Linux/Mac due to different multiprocessing backends.

GPU Requirements


tipeft is designed for GPU acceleration.

  • At least 1 GPU is recommended

  • Suggested minimum: 16GB VRAM

  • Memory usage depends on:

    • sequence length

    • model size

    • batch size

    • peft configuration

Installation


To install in python, simply do the following:

pip install tipeft

Usage


train_tabular_infused_IA3


Trains a tabular-infused IA3 model for binary classification.

from tipeft import train_tabular_infused_IA3



model, tokenizer = train_tabular_infused_IA3(

    train=train_df,

    val=val_df,

    pretrained_model_name="emilyalsentzer/Bio_ClinicalBERT",

    label_col="in_hospital_mortality",

    text_col="clinical_notes",

    columns_unique_labels_of_tabular_features={

        "gender": 2,

        "insurance": 3,

        "marital_status": 4,

        "anchor_age": 1,

        "anchor_year": 1

    },

    lr=0.001,

    num_epochs=5,

    lr_of_tabular_infused_features=0.0001

)

Parameters


Parameter | Type | Description |

+===========================================+===================+==========================================================================================+

train | pandas.DataFrame | Training dataframe containing text, label, and tabular feature columns |
val | pandas.DataFrame | Validation dataframe with same structure as train |
pretrained_model_name | str | Base model to fine-tune. Supports Bio_ClinicalBERT or BioGPT |
label_col | str | Column name of the binary outcome label (must contain True/False values) |
text_col | str | Column name containing the clinical text |
columns_unique_labels_of_tabular_features | dict | Map feature → num unique values (1 continuous, >1 categorical) |
lr | float | Learning rate (default: 0.001) |
num_epochs | int | Epochs (default: 5) |
lr_of_tabular_infused_features | float | LR for tabular pre-training (default: 0.0001) |

Returns


Return | Type | Description |

+============+==============+===========================+

model | PeftModel | The trained IA3 model |
tokenizer | AutoTokenizer | The tokenizer for the model |

Notes


  • The label_col must contain boolean values (True/False)

  • Categorical features should have >1 unique labels

  • Continuous features should use 1 in the mapping dictionary

  • Ensure all categorical values appear in both train and val sets

  • The trained model is saved to trained_models/IA3_{pretrained_model_name}_{label_col}

Questions?


Contact me at alba@wustl.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tipeft-0.0.5.tar.gz (236.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tipeft-0.0.5-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file tipeft-0.0.5.tar.gz.

File metadata

  • Download URL: tipeft-0.0.5.tar.gz
  • Upload date:
  • Size: 236.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for tipeft-0.0.5.tar.gz
Algorithm Hash digest
SHA256 42762cb251b8199809e53c401dcb731ea32e178bcabc28045bf21c374184659b
MD5 b5e5bb07d2683688e2acd45b2ec362a5
BLAKE2b-256 0f327eda82566b0d9b17de7884e95315c5a177d1373e9cb53e9048f2132681c6

See more details on using hashes here.

File details

Details for the file tipeft-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: tipeft-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for tipeft-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 940ea055fc9422017cacbce90348b017e6f6c6fdf7546fdeb22685fc36b6a13d
MD5 f942b628f43ab586300e1fa00a9687f7
BLAKE2b-256 a4867d4189c4e83c9a66412eb40e3064ce0e72ff4ba3dbdc5d80970989927534

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page