Skip to main content

Text to Video synthesis

Project description

Multi-Modality

Gen1

My Implementation of " Structure and Content-Guided Video Synthesis with Diffusion Models" by RunwayML. "Input videos x are encoded to z0 with a fixed encoder E and diffused to zt. We extract a structure representation s by encoding depth maps obtained with MiDaS, and a content representation c by encoding one of the frames with CLIP. The model then learns to reverse the diffusion process in the latent space, with the help of s, which gets concatenated to zt, as well as c, which is provided via cross-attention blocks. During inference (right), the structure s of an input video is provided in the same manner. To specify content via text, we convert CLIP text embeddings to image embeddings via a prior."

Install

pip3 install gen1

Usage

import torch
from gen1.model import Gen1

model = Gen1()

images = torch.randn(1, 3, 128, 128)
video = torch.randn(1, 3, 16, 128, 128)

run_out = model.forward(images, video)

Datasets

Here is a summary table of the datasets used in the Structure and Content-Guided Video Synthesis with Diffusion Models paper:

Dataset Type Size Domain Description Source
Internal dataset Images 240M General Uncaptioned images Private
Custom video dataset Videos 6.4M clips General Uncaptioned short video clips Private
DAVIS Videos - General Video object segmentation Link
Stock footage Videos - General Diverse video clips -

Citation

@misc{2302.03011,
Author = {Patrick Esser and Johnathan Chiu and Parmida Atighehchian and Jonathan Granskog and Anastasis Germanidis},
Title = {Structure and Content-Guided Video Synthesis with Diffusion Models},
Year = {2023},
Eprint = {arXiv:2302.03011},

Todo

  • Add training script
  • Add in conditional text paramater to pass in text, not just images and or other videos

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gen1-0.0.6.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

gen1-0.0.6-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file gen1-0.0.6.tar.gz.

File metadata

  • Download URL: gen1-0.0.6.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for gen1-0.0.6.tar.gz
Algorithm Hash digest
SHA256 34107d91d3de9c40429e3ba1531f5bd84059698016067bcf17b97982571bd694
MD5 9f32136af09aa151158cc727c9573198
BLAKE2b-256 23e142194372df0357dbbe3bc6c2001a462708a34e440cfeb36bb2172c1e18eb

See more details on using hashes here.

File details

Details for the file gen1-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: gen1-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for gen1-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5ef1bf96afa2974c44dcad64c29d45b39999977b6075e4d61602a470984d5000
MD5 b91839c3e30cc5c508d293764b71a5ad
BLAKE2b-256 5fdad0ad35a305aab9f62ac4ddc9896086e15db0f90c053714fdd3d55a5ce4c7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page