Framework to help you dive into AI
Project description
🤿 AI Dive
AI D
ata I
ngestor, V
erifier and E
ncoder.
This library is meant to be practical examples of how to run real world AI Models given data.
Installation
pip install ai-dive
Why build AI-Dive
In the age of the AI Engineer, it is more likely that you will start by grabbing an off the shelf model as a starting point than training your own from scratch. That is not to say you will never train a model. It is just to say, let's verify state of the art before we go building.
A wide range of AI tasks that used to take 5 years and a research team to accomplish in 2013, now just require API docs and a spare afternoon in 2023.
🤿 AI-Dive let's you easily dive into the results of a model to decide whether it is worth building upon. It also gives a simple and consistent interface to run in your app or implement new models.
Model
TODO: Breakup below into each part
- Model
- Dataset
- Diver
- Saver
Dataset
TODO
Dive
TODO
Save
TODO
All Together Now
TODO
Model & Dataset
There are only a two interfaces to implement to get up and running on any model or dataset.
Dataset
- How to iterate over dataModel
- How to predict given each data point
Dive & Save
There are two helper classes to run your model given a dataset
Diver
- How to run each datapoint from your dataset through the model.Saver
- How to save off the results of the run. Running the model and not saving the results can cost time and money.
Models
AI-Dive provides a wrapper around existing models to make them easy to run on your own data. We are not writing models from scratch, we are simply wrapping them with a consistent interface so they can be evaluated in a consistent way.
from ai.dive.models.vit import ViT
model = ViT()
data = {"full_path": "images/shiba_inu_1.jpg"}
output = model.predict(data)
print(output)
There are a few models implemented already, we are looking to extend this list to new models as the come out, or allow this interface to be implemented in your package to save you time evaluating.
HELP US BUILD OUT OUR MODEL LIBRARY OR IMPLEMENT YOUR OWN TODO: Show how to do either
- Vision Transformer (ViT)
- Llama-2
- Mistral-7b
- Dalle-3
- Stable Diffusion
- Magic Animate
Datasets
Models are worthless without the data to run and evaluate them on. Sure you can poke your model with a stick by running on a single example, but the real insights come from running your model given a dataset.
from ai.dive.models.vit import ViT
from ai.dive.data.directory_classification import DirectoryClassification
# Instantiate the model and dataset
model = ViT()
dataset = DirectoryClassification(data_dir="/path/to/images")
# Use a Saver to write the results to a csv
saver = Saver(
"output.csv",
output_keys=['filename', 'class_name', 'prediction', 'probability'],
save_every=10
)
# Run the model on the dataset, and save the results as we go
diver = Diver(model, dataset, saver=saver)
results = diver.run()
# The output will be a list of all the predictions
print(results)
The Diver
object saves you the work of processing each row in the dataframe and the Saver
takes care of writing all the results to disk so you can compare them across runs.
With plug and play models and datasets, the hope is anyone can evaluate a model against any dataset and share the results quickly and effectively.
Model Interface
TODO
Dataset Interface
A dataset has to implement two methods __len__
and __getitem__
so that we can iterate over it. If it implements _build
, you can load everything into memory to make the other calls faster.
Here is an example dataset that iterates over a directory of images with the folder names as classnames.
Example directory structure:
images/
cat/
1.jpg
2.jpg
dog/
1.jpg
2.jpg
3.jpg
Example data loader:
from ai.dive.data.dataset import Dataset
import os
class DirImageClassification(Dataset):
def __init__(self, data_dir):
super().__init__()
self.data_dir = data_dir
# For iterating over the dataset
def __len__(self):
return len(self.filepaths)
# For iterating over the dataset
def __getitem__(self, idx):
return {
"filepath": self.filepaths[idx],
"class_name": self.labels[idx]
}
# Override this function to load the dataset into memory for fast access
def _build(self):
# iterate over files in directory, taking the directory name as the label
labels = []
filepaths = []
for root, dirs, files in os.walk(self.data_dir):
for file in files:
if file.endswith(".jpg") or file.endswith(".png"):
labels.append(os.path.basename(root))
filepaths.append(os.path.join(root, file))
self.labels = labels
self.filepaths = filepaths
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ai-dive-0.0.1.tar.gz
.
File metadata
- Download URL: ai-dive-0.0.1.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70a489b6bf5e968210fc3f8cf440991a4428d1f59ddd402dc75ea95adf139562 |
|
MD5 | 7bf55c25999b94b0016fe1819a1f3025 |
|
BLAKE2b-256 | 71f05ca6c606654657c3c0c86489e744819ffb92f3ce902cf89915dfd4bda817 |
File details
Details for the file ai_dive-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: ai_dive-0.0.1-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4683d65f6e0c653dbf57166ce1f6ac214fdd3c49304836c4b63d707f12f09f2b |
|
MD5 | 9bfff236b685c518b9bf6578094f25d6 |
|
BLAKE2b-256 | 1916eae7dc74b23111c72175bc6134d9afa8bbb5671944fe78b69fc68951ea87 |