Skip to main content

Easiest way of fine-tuning HuggingFace video classification models.

Project description

Easiest way of fine-tuning HuggingFace video classification models.

pypi version total downloads fcakyon twitter

🚀 Features

video-transformers uses:

and supports:

🏁 Installation

  • Install Pytorch:
conda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch
  • Install pytorchvideo from main branch:
pip install git+https://github.com/facebookresearch/pytorchvideo.git
  • Install video-transformers:
pip install video-transformers

🔥 Usage

  • Prepare video classification dataset in such folder structure (.avi and .mp4 extensions are supported):
train_root
    label_1
        video_1
        video_2
        ...
    label_2
        video_1
        video_2
        ...
    ...
val_root
    label_1
        video_1
        video_2
        ...
    label_2
        video_1
        video_2
        ...
    ...
  • Fine-tune CVT (from HuggingFace) + Transformer based video classifier:
from torch.optim import AdamW
from video_transformers import TimeDistributed, VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.necks import TransformerNeck
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6

backbone = TimeDistributed(TransformersBackbone("microsoft/cvt-13", num_unfrozen_stages=0))
neck = TransformerNeck(
    num_features=backbone.num_features,
    num_timesteps=8,
    transformer_enc_num_heads=4,
    transformer_enc_num_layers=2,
    dropout_p=0.1,
)

download_ucf6("./")
datamodule = VideoDataModule(
    train_root="ucf6/train",
    val_root="ucf6/val",
    batch_size=4,
    num_workers=4,
    num_timesteps=8,
    preprocess_input_size=224,
    preprocess_clip_duration=1,
    preprocess_means=backbone.mean,
    preprocess_stds=backbone.std,
    preprocess_min_short_side=256,
    preprocess_max_short_side=320,
    preprocess_horizontal_flip_p=0.5,
)

head = LinearHead(hidden_size=neck.num_features, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head, neck)

optimizer = AdamW(model.parameters(), lr=1e-4)

Trainer = trainer_factory("single_label_classification")
trainer = Trainer(
    datamodule,
    model,
    optimizer=optimizer,
    max_epochs=8
)

trainer.fit()
  • Fine-tune MobileViT (from Timm) + GRU based video classifier:
from video_transformers import TimeDistributed, VideoModel
from video_transformers.backbones.timm import TimmBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.necks import GRUNeck
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6

backbone = TimeDistributed(TimmBackbone("mobilevitv2_100", num_unfrozen_stages=0))
neck = GRUNeck(num_features=backbone.num_features, hidden_size=128, num_layers=2, return_last=True)

download_ucf6("./")
datamodule = VideoDataModule(
    train_root="ucf6/train",
    val_root="ucf6/val",
    batch_size=4,
    num_workers=4,
    num_timesteps=8,
    preprocess_input_size=224,
    preprocess_clip_duration=1,
    preprocess_means=backbone.mean,
    preprocess_stds=backbone.std,
    preprocess_min_short_side=256,
    preprocess_max_short_side=320,
    preprocess_horizontal_flip_p=0.5,
)

head = LinearHead(hidden_size=neck.hidden_size, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head, neck)

Trainer = trainer_factory("single_label_classification")
trainer = Trainer(
    datamodule,
    model,
    max_epochs=8
)

trainer.fit()
  • Perform prediction for a single file or folder of videos:
from video_transformers import VideoModel

model = VideoModel.from_pretrained(model_name_or_path)

model.predict(video_path="video.mp4")
>> [{'filename': "video.mp4", 'predictions': {'class1': 0.98, 'class2': 0.02}}]

🤗 Full HuggingFace Integration

  • Push your fine-tuned model to the hub:
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")

model.push_to_hub('model_name')
  • Load any pretrained video-transformer model from the hub:
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")

model.from_pretrained('account_name/model_name')
  • Push your model to HuggingFace hub with auto-generated model-cards:
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_hub('account_name/app_name')
  • (Incoming feature) Push your model as a Gradio app to HuggingFace Space:
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_space('account_name/app_name')

📈 Multiple tracker support

  • Tensorboard tracker is enabled by default.

  • To add Neptune/Layer ... tracking:

from video_transformers.tracking import NeptuneTracker
from accelerate.tracking import WandBTracker

trackers = [
    NeptuneTracker(EXPERIMENT_NAME, api_token=NEPTUNE_API_TOKEN, project=NEPTUNE_PROJECT),
    WandBTracker(project_name=WANDB_PROJECT)
]

trainer = Trainer(
    datamodule,
    model,
    trackers=trackers
)

🕸️ ONNX support

  • Convert your trained models into ONNX format for deployment:
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.to_onnx(quantize=False, opset_version=12, export_dir="runs/exports/", export_filename="model.onnx")

🤗 Gradio support

  • Convert your trained models into Gradio App for deployment:
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.to_gradio(examples=['video.mp4'], export_dir="runs/exports/", export_filename="app.py")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video-transformers-0.0.7.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

video_transformers-0.0.7-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file video-transformers-0.0.7.tar.gz.

File metadata

  • Download URL: video-transformers-0.0.7.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for video-transformers-0.0.7.tar.gz
Algorithm Hash digest
SHA256 4d23983fc53a46f80d5cd981738444ae846de4f982caa21e051fa6a377fc6370
MD5 2be6a65d6f7a2218ec14f2a1b2b5836a
BLAKE2b-256 7f349f9807b49ccd5329e9304538fdf1f306e3470f675acf7126de704f789c71

See more details on using hashes here.

File details

Details for the file video_transformers-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for video_transformers-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 16d4caee7c6834c31e6b4739e401535e9112e1e22f18329bdd15d30a4c802c48
MD5 269bcf94691f31f32ca65b99cca99bea
BLAKE2b-256 dcb0572bc2dd3e0f640fa080682ba1cecd2c70674d4e1bf864329e27196efb0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page