Skip to main content

GEN5: A Binary Container Format For Improving Reproducibility In Workflows Concerning AI-Generated Images

Project description

Gen5 is a binary container format aimed at increased reproducibility for AI-generated images. It enables the storage of several key pieces of information, such as :

1. Environment & Provenance Tracking

The GEN5 file format has a dedicated chunk that captures Runtime Environment Details at the time of generation. This includes:

  • Operating System and Machine Identifiers
  • CPU/GPU models, core counts, and memory
  • Deep learning framework (e.g., PyTorch) and compute backend (e.g., CUDA)
  • Driver and library versions (e.g., CUDA version, NVIDIA driver) This is further utilised to warn the user of environment mismatches.

Environment Chunk (ENVC)

GEN5 includes an optional but strongly recommended environment chunk (ENVC) that captures a verifiable snapshot of the software stack used during generation. Unlike generic metadata, this chunk:

Is structured as a list of canonicalized components (e.g., torch, python, cuda, gpu) Records each component’s name, version, and a SHA-256 digest of its canonical string Is compressed and stored as a standalone binary chunk with its own offset and hash in the chunk table Enables integrity verification and drift detection across environments This chunk is not embedded in the metadata JSON, but referenced via the chunks array in the decoded output (type "ENVC"), ensuring it remains tamper-evident and tooling-friendly.

HOW TO USE? It is automatically populated and stored for you! When environment mismatch is detected, a warning is issued. Example Warning for gpu mismatch:

UserWarning: Environment component 'gpu' differs:
File: name=gpu;model=NVIDIA A100;driver=535.129.01
Current: name=gpu;model=NVIDIA GeForce RTX 4090;driver=550.40.07

2. Latent Tensor Storage

GEN5 natively supports the storing of latent representations (ie, diffusion model latents, VAE encodings) alongside the generated images. These are serialized as one or more 'LATN' chunks. These are stored in their native memory layout (as provided by the user). For PyTorch-generated latents, this is typically NCHW. And for TensorFlow, it is going to be NHWC. The format does not enforce or convert layout, insted it preserves the exact shape and byte representation as provided by the user.

HOW TO STORE? Example:

gen5.file_encoder(
  filename="my_ai_art.gen5",
  latent={
      "initial_noise": initial_noise.numpy(),
      "final_latent": final_latent.numpy()
  },
  chunk_records=[],
  model_name="Stable Diffusion 3",
  model_version="3.0",
  prompt="A cyberpunk cat wearing neon goggles, cinematic lighting",
  tags=["cat", "cyberpunk", "neon"],
  img_binary=img_bytes,
  convert_float16=True,       # store latents in float16 to save space (optional)
  should_compress=True,       #SEE THE CRITICAL WARNING BELOW
  generation_settings={
      "seed": 1337,
      "steps": 30,
      "sampler": "euler_ancestral",
      "cfg_scale": 7.0,
      "scheduler": "karras",
      "precision": "fp16"
  },
  hardware_info={
      "machine_name": "desktop-alpha",
      "os": "linux",
      "cpu": "AMD Ryzen 9 7950X",
      "cpu_cores": 16,
      "gpu": [
          {
              "name": "NVIDIA RTX 4090",
              "memory_gb": 24,
              "driver": "550.54.14",
              "cuda_version": "12.4"
          }
      ],
      "ram_gb": 128.0,
      "framework": "torch",
      "compute_lib": "cuda"
  }
)

!!! danger CRITICAL WARNING: The should _compress is for compressinng the latent chunks. For High Entropy Tensors: FALSE, Low Entropy Tensors: TRUE

Extra Metadata:

  • Model name and version
  • Prompt
  • Tags
  • Hardware information
  • Generation settings

(may include sampler-specific parameters)

The Initial noise tensor can be fed back in while using a model (local ones) to obtain similar results.

This has proven to be capable of producing extremely similar images. Although we use a random seed integer value, the usage of the real tensor provides increased reproducibility.

Installation

Just pip install the package!

pip install gen5

Usage

import the classes

from gen5.main import Gen5FileHandler

First you need to instantiate the Gen5FileHandler class.

gen5 = Gen5FileHandler()

Encoding

!!! danger DISCLAIMER: The encoder expects NumPy arrays.
If you use PyTorch tensors, convert them with .detach().cpu().numpy().

from gen5.main import Gen5FileHandler

gen5 = Gen5FileHandler()
initial_noise_tensor = torch.randn(batch_size, channels, height, width)
latent = {
    "initial_noise": initial_noise_tensor.detach().cpu().numpy() #The encoder expects numpy array not a torch tensor object
}
binary_img_data = gen5.png_to_bytes(r'path/to/image.png') # use the helper function to convert image to bytes

gen5.file_encoder(
    filename="encoded_img.gen5", # The .gen5 extension is required!
    latent=latent,# initial latent noise
    chunk_records=[],
    model_name="Stable Diffusion 3",
    model_version="3", # Model Version
    prompt="A puppy smiling, cinematic",
    tags=["puppy","dog","smile"],
    img_binary=binary_img_data,
    convert_float16=False, # whether to convert input tensors to float16
    generation_settings={
        "seed": 42,
        "steps": 20,
        "sampler": "ddim",
        "cfg_scale": 7.5,
        "scheduler": "pndm",
        "eta": 0.0,
        "guidance": "classifier-free",
        "precision": "fp16",
        "deterministic": True
    },
    hardware_info={
        "machine_name": "test_machine",
        "os": "linux",
        "cpu": "Intel",
        "cpu_cores": 8, # minimum 1
        "gpu": [{"name": "RTX 3090", "memory_gb": 24, "driver": "nvidia", "cuda_version": "12.1"}],
        "ram_gb": 64.0,
        "framework": "torch",
        "compute_lib": "cuda"
    }
)

Decoding

decoded = gen5.file_decoder(filename)
# now to save the metadata
metadata = decoded["metadata"]["gen5_metadata"]

# to just get specific metadata blocks
model_info = decoded["metadata"]["gen5_metadata"]["model_info"]

# to save decoded metadata to a json file
with open("decoded_metadata.json", "w") as f:
    json.dump(decoded["metadata"], f, indent=2)

# to save just the image_binary as png
image_bytes = decoded["chunks"].get("image")
if image_bytes is not None:
    img = Image.open(io.BytesIO(image_bytes))
    img.save("decoded_image.png")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gen5-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gen5-0.1.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file gen5-0.1.0.tar.gz.

File metadata

  • Download URL: gen5-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gen5-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0ea1659ac904cb1e6860735e07bc08c624b0846c02a466c325816f9932c62876
MD5 d8dae5874d82f14882a691e54fd5dcd1
BLAKE2b-256 278af71cf4c8110c45555d20df8dc433f21c769870e51cc60a2d73cea1c89c94

See more details on using hashes here.

File details

Details for the file gen5-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gen5-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gen5-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1061d7c171650d4631cec9abbf48fc664abf75e04088d5c2261735856cef632d
MD5 a87608a083b590defe5f7a73b0eee92a
BLAKE2b-256 8c0d4b2fb1fe056e66ddbac869a4b35e598446942d5c54132905b6a45535e080

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page