Skip to main content

Video methods for pandas dataframes using TorchCodec

Project description

Pandas Video Methods

Video methods for pandas dataframes using TorchCodec.

Features:

  • Use torchcodec.decoders.VideoDecoder objects in pandas dataframes
  • Call torchcodec.decoders.VideoDecoder methods on a column, for example:
    • TODO
  • Save dataframes with torchcodec.decoders.VideoDecoder objects to Parquet
  • Process videos in parallel with Dask
  • Manipulate video datasets from Hugging Face

Installation

pip install pandas-video-methods

Usage

You can open videos as torchcodec.decoders.VideoDecoder objects using the .open() method.

Once the videos are opened, you can call any VideoDecoder:

TODO

Here is how to enable video methods for VideoDecoders created manually:

TODO

Save

You can save a dataset of torchcodec.decoders.VideoDecoder to Parquet:

# Save
df = pd.DataFrame({"file_path": ["path/to/video.mp4"]})
df["video"] = df["file_path"].video_decoder.open()
df.to_parquet("data.parquet")

# Later
df = pd.read_parquet("data.parquet")
df["video"] = df["video"].video_decoder.enable()

This doesn't just save the paths to the video files, but the actual videos themselves !

Under the hood it saves dictionaries of {"bytes": <bytes of the video file>, "path": <path or name of the video file>}. The videos are saved as bytes using their video encoding by default. Anyone can load the Parquet data even without pandas-video-methods since it doesn't rely on extension types.

Note: if you created the torchcodec.decoders.VideoDecoder manually, don't forget to enable the video methods to enable saving to Parquet.

Run in parallel

Dask DataFrame parallelizes pandas to handle large datasets. It enables faster local processing with multiprocessing as well as distributed large scale processing. Dask mimics the pandas API:

import dask.dataframe as dd
from distributed import Client
from pandas_video_methods import TorchCodecVideoDecoderMethods

dd.extensions.register_series_accessor("video_decoder")(TorchCodecVideoDecoderMethods)

if __name__ == "__main__":
    client = Client()
    df = dd.read_csv("path/to/large/dataset.csv")
    df = df.repartition(npartitions=1000)  # divide the processing in 1000 jobs
    df["video"] = df["file_path"].video_decoder.open()
    # TODO
    df.to_parquet("data_folder")

Hugging Face support

Most video datasets in Parquet format on Hugging Face are compatible with pandas-video-methods. For example you can load the TODO:

df = pd.read_parquet(TODO)
df["video"] = df["video"].video_decoder.enable()

Datasets created with pandas-video-methods and saved to Parquet are also compatible with the Dataset Viewer on Hugging Face and the datasets library:

# TODO
df.to_parquet("hf://datasets/username/dataset_name/train.parquet")

Display in Notebooks

You can display a pandas dataframe of videos in a Jupyter Notebook or on Google Colab in HTML:

from IPython.display import HTML
HTML(df.head().to_html(escape=False, formatters={"video": df.video.video_decoder.html_formatter}))

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_video_methods-0.0.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_video_methods-0.0.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file pandas_video_methods-0.0.1.tar.gz.

File metadata

  • Download URL: pandas_video_methods-0.0.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.2 Darwin/24.4.0

File hashes

Hashes for pandas_video_methods-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b92c431474e556d246b84c86b19684efbb1996a4d9df9881a31cb97dfe9020f6
MD5 1f39f2f598287d0a6dcf6d75a2c2f4b2
BLAKE2b-256 054ed55b1e5773d0b34d8b734a72835b184d2575868bde3539e381c922a87800

See more details on using hashes here.

File details

Details for the file pandas_video_methods-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_video_methods-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f87f01fc1c834a70f010edf97a4a124f4c0ac5e8adcfaac290e0eebdff239e46
MD5 c56ceaf6cf5d13ce62bdbadabcfe0b8e
BLAKE2b-256 1a3489719ca83bc146d2d6b36868351681500e269a3689fd480580234332de95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page