OptiNet is a Python library for optimizing traditional machine learning models.
Project description
OptiNet - A Versatile Library for ML and NLP Model Training
OptiNet is a Python library designed to simplify and optimize traditional Machine Learning (ML) and Natural Language Processing (NLP) workflows. With an easy-to-use interface, OptiNet allows you to prepare datasets, train models, and evaluate performance for both ML and large language models (LLMs). This library supports scikit-learn models as well as transformer-based models from Hugging Face, with support for LoRA and QLoRA for parameter-efficient fine-tuning.
Features
- Unified Interface: Train and evaluate both traditional ML models and transformer-based NLP models.
- Data Preparation: Quickly load, split, and prepare data for training.
- Tokenizer Integration: Easily tokenize text datasets using Hugging Face's transformers for NLP tasks.
- Model Training: Train both ML models (e.g., scikit-learn) and large language models using Trainer from Hugging Face.
- LoRA & QLoRA Support: Fine-tune large language models with Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) for efficient training.
- Scalable Evaluations: Evaluate trained models and get performance metrics like accuracy.
Installation
You can install OptiNet using pip:
pip install OptiNet
Usage
1. Import and Initialize OptiNet
OptiNet can be used for both ML models (e.g., scikit-learn classifiers) and NLP models (e.g., transformers). Here is how you can get started:
from optinet import OptiNet
from sklearn.ensemble import RandomForestClassifier
from transformers import AutoModelForSequenceClassification
# Example ML Model
ml_model = RandomForestClassifier()
optinet_ml = OptiNet(model=ml_model, model_type='ml')
# Example NLP Model
llm_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
optinet_nlp = OptiNet(model=llm_model, model_type='llm', model_name='distilbert-base-uncased')
2. Prepare Data
For ML models, OptiNet can load and split datasets like digits from scikit-learn:
# Prepare data for ML model
X_train, X_test, y_train, y_test = optinet_ml.prepare_data(dataset='digits')
For NLP models, you can load datasets from Hugging Face's datasets library:
# Prepare data for NLP model
nlp_dataset = optinet_nlp.prepare_data(dataset='imdb') # e.g., IMDB movie reviews dataset
Custom Dataset Support
If you have a custom dataset (e.g., loaded from a file or a database), you can pass the dataset directly using the dataset_obj parameter:
# Prepare data from a custom dataset
my_dataset = load_dataset("csv", data_files="my_custom_data.csv")
nlp_dataset = optinet_nlp.prepare_data(dataset_obj=my_dataset)
This approach allows flexibility to use any custom dataset, without being restricted to the built-in ones.
3. Tokenize Data (For NLP Models)
If you're working with NLP models, you need to tokenize the data before training:
# Tokenize NLP dataset
tokenized_dataset = optinet_nlp.tokenize_data(nlp_dataset)
4. Train the Model
You can train both ML and NLP models using the train_model() method. This is where you can choose to fine-tune your model with LoRA and QLoRA by passing the relevant parameters.
Train ML model:
# Train ML model
optinet_ml.train_model(X_train, y_train)
Train NLP model with LoRA:
# Train NLP model with LoRA fine-tuning
optinet_nlp.train_model(
tokenized_dataset['train'],
output_dir="./output_lora",
num_train_epochs=3,
per_device_train_batch_size=8,
lora_r=4,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.1,
task_type="SEQ_CLS"
)
Train NLP model with QLoRA (4-bit quantization):
# Train NLP model with QLoRA (using 4-bit quantization)
optinet_nlp.train_model(
tokenized_dataset['train'],
output_dir="./output_qlora",
num_train_epochs=3,
per_device_train_batch_size=8,
quantization_config={"load_in_4bit": True, "bnb_4bit_compute_dtype": torch.float16},
lora_r=4,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.1,
task_type="SEQ_CLS"
)
5. Evaluate the Model
Evaluate the performance of your trained model:
# Evaluate ML model
accuracy = optinet_ml.evaluate_model(X_test, y_test)
print(f"ML Model Accuracy: {accuracy:.2f}")
# Evaluate NLP model
results = optinet_nlp.evaluate_model(tokenized_dataset['test'])
print("NLP Model Evaluation:", results)
Requirements
OptiNet depends on several popular Python packages for ML and NLP tasks:
scikit-learntransformersdatasetstorchpeft(for LoRA and QLoRA support)
To install these requirements, you can use the following command:
pip install scikit-learn transformers datasets torch peft
License
This project is licensed under the MIT License - see the LICENSE file for details.
Authors
- Vishwanath Akuthota
- Ganesh Thota
- Krishna Avula
Contributing
We welcome contributions to improve OptiNet. Please feel free to submit issues and pull requests on the GitHub repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file optinet-0.1.7.tar.gz.
File metadata
- Download URL: optinet-0.1.7.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00a55cb54b0a515dc3f7901b0122d40369c7377e6bdb34b4f5f8ab00d8b7a0d7
|
|
| MD5 |
5ac97dc22523c47d4d72b538d5908652
|
|
| BLAKE2b-256 |
cffc1adfcfcaff428fd49e6889c4d43a64d1f796a2f25d28213e9b9585e7b07b
|
File details
Details for the file optinet-0.1.7-py3-none-any.whl.
File metadata
- Download URL: optinet-0.1.7-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a8086949a3f7292905267b8ba743ad34fb78c3a18e2897b50acf8dc503c5acc
|
|
| MD5 |
514a9e16c351882f299198a1b79cdf1b
|
|
| BLAKE2b-256 |
6766f65de69522ea675b654adbe46994d27bba0115de813cff6bb651e0a2345f
|