Core model and inference for FluxFlow text-to-image generation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

danny-mio

These details have not been verified by PyPI

Project description

FluxFlow Core

Smaller, Faster, More Expressive: Text-to-Image Generation with Bezier Activation Functions

🚧 Project Status

Training In Progress: FluxFlow models are currently in Week 1-4 of systematic validation.

Status:

✅ Architecture implemented and tested (including v0.8.0 pillar-attention)
🔄 VAE training in progress (Bezier + ReLU baselines)
⏳ Flow training pending VAE completion
⏳ Empirical benchmarks pending training completion
📅 Expected completion: Late February 2026

All performance claims below are theoretical targets - empirical validation underway.

FluxFlow is a novel approach to text-to-image generation that targets 2-3× smaller models with equivalent or superior quality compared to standard architectures. The key innovation is the use of Cubic Bezier activation functions, which provide 3rd-degree polynomial expressiveness, enabling each neuron to learn complex, smooth non-linear transformations.

Core Philosophy

Inspired by Kolmogorov-Arnold Networks (KAN), FluxFlow extends the concept of learnable activation functions to large-scale generative models. While KAN uses B-splines, FluxFlow employs Cubic Bezier curves with three distinct control point generation strategies.

Bezier Activations: Three Approaches

FluxFlow employs three Bezier activation strategies, each suited for different architectural needs:

1. Input-Based (BezierActivation) - Most Common

Control points derived directly from input channels via 5× channel expansion pattern.

Implementation: Previous layer outputs 5× channels, BezierActivation reduces to 1×
Parameters: 0 learnable parameters in activation (cost shifted to previous layer)
Usage: VAE encoder/decoder, convolutional layers
Pattern: Conv2d(C, 5C) → BezierActivation() → C outputs

2. Trainable (TrainableBezier) - Specialized Layers

Learnable control points for per-channel transformations.

Implementation: 4 learnable parameters per output dimension
Parameters: 4×D learnable parameters (minimal overhead)
Usage: VAE latent bottleneck (mu/logvar), RGB output layer
Pattern: Linear(C, C) → TrainableBezier(C) → C outputs

3. Pillar-Based - Transformer MLPs

Control points generated by 4 independent depth-3 MLP networks for maximum expressiveness.

Parameters: 4×(depth=3)×D² (substantial overhead for expressiveness)
Usage: Flow transformer MLP layers only
See BEZIER_ACTIVATIONS.md#pillar-based-bezier for implementation details

Unified Formula (All Approaches):

B(t) = (1-t)³·p₀ + 3(1-t)²·t·p₁ + 3(1-t)·t²·p₂ + t³·p₃

What differs: how (t, p₀, p₁, p₂, p₃) are obtained (from input, learned, or computed by MLPs).

Smoothness: C² continuous (continuous up to second derivative), providing smooth gradients unlike ReLU's discontinuous derivative.

Expected Benefits (empirical validation in progress):

Smaller models: 2-2.5× fewer parameters target for equivalent quality
Faster inference: 38% speedup target through layer reduction
Better gradients: Smooth C² continuous gradients reduce vanishing gradient issues
Adaptive: Each approach provides different expressiveness-cost trade-offs

Installation

Production Install

pip install fluxflow

What gets installed:

fluxflow - Core model architectures and inference pipeline
Flow matching models, VAE, and text encoders
Note: Does NOT include training tools (use fluxflow-training for that)
Note: Does NOT include UI (use fluxflow-ui or fluxflow-comfyui for that)

Package available on PyPI: fluxflow v0.8.0

Development Install

git clone https://github.com/danny-mio/fluxflow-core.git
cd fluxflow-core
pip install -e ".[dev]"

System Requirements

Minimum Requirements

Python: 3.10 or later
CPU: Modern x86_64 processor
RAM: 16 GB minimum, 32 GB recommended
Storage: 10 GB for package and dependencies

GPU Requirements (Optional but Recommended)

For Training

GPU: NVIDIA GPU with CUDA support
VRAM: 24 GB minimum (NVIDIA RTX 3090, A5000, or better)
CUDA: 11.8 or later
cuDNN: 8.6 or later
Recommended: NVIDIA A6000 (48GB) or A100 (40GB/80GB)

For Inference

GPU: NVIDIA GPU with CUDA support
VRAM: 8 GB minimum, 12 GB recommended
CUDA: 11.8 or later
Recommended: NVIDIA RTX 3060 (12GB) or better

CPU-Only Mode

Supported for inference (slower)
Requires 32 GB RAM
Not recommended for training (very slow)

Apple Silicon (MPS)

Supported on M1/M2/M3 with macOS 12.3+
Good performance for inference
Training supported but slower than CUDA

Dependency Notes

numpy: Version 2.x not yet supported (use numpy<2.0)
torch: CUDA 11.8 or 12.1 builds recommended
transformers: 4.30.0+ required for text encoding

Key Features

Bezier Activations: Learnable 3rd-degree (cubic) polynomial activation functions
Compact VAE: Variational autoencoder with 25M params (encoder) + 30M params (decoder)
Flow-based Diffusion: 150M param transformer with rotary embeddings
Text Conditioning: DistilBERT-based encoder (~71M params total: ~66M backbone + Bezier projection layers)
- Note: Current implementation uses pre-trained DistilBERT as a temporary solution. Future versions will feature a custom Bezier-based text encoder for full end-to-end training and multimodal support.
Adaptive Architecture: Different activation strategies per component (Bezier for generative, LeakyReLU for discriminative)

Quick Start

High-Level API (Recommended)

from fluxflow.models import FluxFlowPipeline

# Load from checkpoint directory (standard training output)
pipeline = FluxFlowPipeline.from_pretrained("path/to/checkpoint_dir/")

# Or load from a single checkpoint file
# pipeline = FluxFlowPipeline.from_pretrained("path/to/checkpoint.safetensors")

# Generate image with Diffusers-style API
image = pipeline(
    prompt="a beautiful sunset over mountains",
    num_inference_steps=50,
    guidance_scale=7.5,
    height=512,
    width=512,
).images[0]

image.save("output.png")

Advanced Usage

from fluxflow.models import FluxFlowPipeline
import torch

# Load with specific settings
pipeline = FluxFlowPipeline.from_pretrained(
    "path/to/checkpoint.safetensors",
    torch_dtype=torch.float16,
    device="cuda",
)

# Generate with more control
result = pipeline(
    prompt="a serene mountain landscape at dawn",
    negative_prompt="blurry, low quality",
    num_inference_steps=50,
    guidance_scale=7.5,
    height=768,
    width=768,
    num_images_per_prompt=4,
    generator=torch.Generator().manual_seed(42),
)

# Save all generated images
for i, img in enumerate(result.images):
    img.save(f"output_{i}.png")

Model Versions

Version	Description	Status
`0.8.0`	Pillar-attention (FiLM + cross-attn on pillars)	Current
`0.7.0`	Context-enhanced flow transformer	Stable
`0.6.0`	Default stable	Stable
`0.3.0`	Legacy	Legacy

Default model version: 0.6.0 (set by FluxFlowConfig.model.model_version)
v0.8.0 checkpoints require load_versioned_checkpoint() — see docs/MIGRATION.md
For versioned checkpoints, use load_versioned_checkpoint() and set model_version when saving

Classifier-Free Guidance (CFG)

Available since v0.3.0: FluxFlow supports Classifier-Free Guidance for enhanced generation control.

What is CFG?

CFG improves generation quality by amplifying the influence of text conditioning. It works by:

Running two forward passes: one with text, one without
Interpolating between conditional and unconditional predictions
Producing images that more strongly follow the text prompt

Using CFG

from fluxflow.models import FluxFlowPipeline

pipeline = FluxFlowPipeline.from_pretrained("path/to/checkpoint.safetensors")

# Generate with CFG (requires model trained with cfg_dropout_prob > 0)
image = pipeline(
    prompt="a photorealistic portrait of a cat",
    negative_prompt="blurry, distorted, low quality",  # Optional
    num_inference_steps=50,
    guidance_scale=5.0,  # Recommended: 3.0-7.0 for balanced results
    height=512,
    width=512,
).images[0]

Guidance Scale Guidelines

1.0: No guidance (standard generation)
3.0-7.0: Moderate guidance (RECOMMENDED - balanced quality/creativity)
7.0-15.0: Strong guidance (may oversaturate or lose diversity)

Important: CFG requires models trained with cfg_dropout_prob > 0 (typically 0.10-0.15). See fluxflow-training for training details.

Low-Level API

For more control, use the base FluxPipeline:

import torch
from fluxflow.models import FluxPipeline, BertTextEncoder
from transformers import AutoTokenizer

# Load components manually
pipeline = FluxPipeline.from_pretrained("path/to/checkpoint.safetensors")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
text_encoder = BertTextEncoder(embed_dim=1024)  # Must match text_embedding_dim in training config (default: 1024)

# Encode text
text = "a beautiful sunset"
tokens = tokenizer(text, return_tensors="pt", padding="max_length", max_length=512)
text_embeddings = text_encoder(tokens["input_ids"])

# Manual forward pass (requires implementing sampling loop)
# See fluxflow-training for complete examples

Package Contents

fluxflow.models - Model architectures (VAE, Flow, Encoders, Discriminators)
- activations - BezierActivation, TrainableBezier
- vae - FluxCompressor (encoder) and FluxExpander (decoder)
- flow - FluxFlowProcessor (diffusion transformer)
- encoders - BertTextEncoder
- discriminators - PatchDiscriminator (for GAN training)
- conditioning - SPADE, FiLM, Gated conditioning modules
fluxflow.utils - Utilities for I/O, visualization, and logging
fluxflow.config - Configuration management
fluxflow.types - Type definitions and protocols
fluxflow.exceptions - Custom exception classes

Why Bezier Activations?

Mathematical Foundation

Traditional activations provide a single fixed transformation:

ReLU: max(0, x) - piecewise linear, 50% gradient death
GELU/SiLU: Fixed smooth curves, no adaptability

Bezier activations provide a learnable manifold:

4 control points per dimension (p₀, p₁, p₂, p₃)
Smooth interpolation via cubic Bezier curves
Adaptive transformations: Each dimension can follow a different cubic curve
TrainableBezier: Optional 4×D learnable parameters for per-dimension optimization

Performance Targets

⚠️ Training In Progress: The metrics below are theoretical targets based on architecture analysis and parameter counting. Empirical measurements will be added to this table upon training completion.

Metric	ReLU Baseline (Target)	Bezier FluxFlow (Target)	Expected Improvement
Parameters	500M	183M	2.7× smaller
Inference time (A100, 512², 50 steps)	1.82s	1.12s	38% faster
Training memory (batch=2)	10.2GB	4.1GB	60% reduction
FID (COCO val)	15.2±0.3	≤15.0	Equivalent quality

Status:

VAE training: 🔄 In progress
Flow training: ⏳ Pending VAE completion
Baseline comparison: ⏳ Pending both completions
Empirical results: 📊 Will be published to MODEL_ZOO.md

Strategic Activation Placement

FluxFlow uses different activations based on component purpose:

Bezier activations (high expressiveness needed):

VAE encoder/decoder: Complex image↔latent mappings
Flow transformer: Core generative model
Text encoder: Semantic embedding space

LeakyReLU (memory efficiency critical):

GAN discriminator: Binary classification, 2× forward passes per batch
Saves 126 MB per batch vs Bezier

ReLU (simple transformations):

SPADE normalization: Affine scale/shift operations

API Comparison

Feature	FluxFlowPipeline	FluxPipeline
Type	`DiffusionPipeline`	`nn.Module`
Input	Text prompts	Pre-encoded embeddings
Inference	Full iterative denoising	Single forward pass
Guidance	Classifier-free (automatic)	Manual implementation
Scheduler	Built-in (DPMSolver++)	None
Output	PIL Images / numpy	Tensor
Use case	Production inference	Training / Custom pipelines

When to use which:

FluxFlowPipeline: Text-to-image generation, production use, Diffusers ecosystem
FluxPipeline: Training, fine-tuning, custom inference loops, research

Model Architecture Overview

Total Parameters: ~183M (default config: vae_dim=128, feat_dim=128)

Component	Parameters	Activation Type	Purpose
FluxCompressor	12.6M	BezierActivation	Image → latent encoding
FluxExpander	94.0M	BezierActivation	Latent → image decoding
FluxFlowProcessor	5.4M	BezierActivation	Diffusion transformer
BertTextEncoder	71.0M	BezierActivation (projection)	Text → embedding
PatchDiscriminator	45.1M	LeakyReLU	GAN training only

Note: FluxExpander is asymmetrically larger due to progressive upsampling with SPADE conditioning layers.

Technical Details

Bezier Activation Types

1. Input-Based BezierActivation

Channel expansion pattern (5→1 dimension reduction):

# Previous layer outputs 5× channels
nn.Conv2d(in_ch, out_ch * 5, kernel_size=3, padding=1)
# BezierActivation splits into [t, p0, p1, p2, p3] and reduces to out_ch
BezierActivation(t_pre_activation="sigmoid", p_preactivation="silu")

Parameters: 0 learnable (but previous layer needs 5× weights) Use: VAE encoder/decoder, convolutional layers

2. TrainableBezier

Fixed learnable control points (dimension-preserving):

# Standard dimension mapping
nn.Linear(latent_dim, latent_dim)
# Add 4×D learnable parameters
TrainableBezier((latent_dim,), channel_only=True)

Parameters: 4×D learnable (e.g., 1024 params for D=256) Use: VAE latent bottleneck (mu/logvar), RGB output layer

3. Pillar-Based

Context-dependent control points from deep MLPs:

# 4 separate depth-3 MLP networks
p0 = pillarLayer(d_model, d_model, depth=3, activation=nn.SiLU())
p1 = pillarLayer(d_model, d_model, depth=3, activation=nn.SiLU())
p2 = pillarLayer(d_model, d_model, depth=3, activation=nn.SiLU())
p3 = pillarLayer(d_model, d_model, depth=3, activation=nn.SiLU())
# Generate control points from gated input
g = torch.sigmoid(img_seq)
# Concatenate and apply Bezier
output = BezierActivation(torch.cat([img_seq, p0(g), p1(g), p2(g), p3(g)], dim=-1))

Parameters: 4×(depth=3)×D² (e.g., 198K params for D=128) Use: Flow transformer MLP layers

Pre-activation parameters (for Input-Based and Pillar-Based):

t_pre_activation: Transform input t (sigmoid, silu, tanh, or None)
p_preactivation: Transform control points (sigmoid, silu, tanh, or None)

Current FluxFlow Configuration

VAE Encoder/Decoder: Input-Based BezierActivation

Pattern: ConvTranspose2d(C, 5C) → BezierActivation() → Conv2d(C, 5C) → BezierActivation()
Rationale: 0 activation params, smooth gradients for image↔latent mapping

VAE Latent (mu/logvar): TrainableBezier

Pattern: Linear(D, D) → TrainableBezier(D)
Rationale: Per-channel learned curves for latent distribution (1024 params for D=256)

VAE RGB Output: TrainableBezier

Pattern: Conv2d(C, 3, ...) → TrainableBezier(3)
Rationale: Learned per-channel color correction (12 params)

Flow Transformer: Pillar-Based BezierActivation

Control point generation: 4 × pillarLayer(d_model, d_model, depth=3)
Gating: sigmoid(img_seq) bounds inputs to [0,1] before pillar processing
Final activation: BezierActivation(concat([img_seq, p0, p1, p2, p3]))
Rationale: Highly expressive context-dependent activations per token (~198K params per block for d_model=128)

Text Encoder: Input-Based BezierActivation

GELU alternative for BERT-like architectures
Learns optimal text→latent space mapping

Discriminator: LeakyReLU

Memory efficiency - called 2× per batch (generator+real)

SPADE Blocks: ReLU

Simple affine transformations don't benefit from Bezier complexity

Future Directions

Custom Text Encoder

The current implementation uses pre-trained DistilBERT as a practical starting point. Future development will create a custom text encoder built entirely with Bezier activations, enabling:

True end-to-end Bezier-based training
Better semantic alignment with the generative model
Reduced dependency on external pre-trained models
Foundation for multimodal extensions

Multimodal Extensions

With a custom Bezier text encoder, FluxFlow can be extended to:

Text + Image → Image: Conditioning on reference images
Video generation: Temporal consistency via Bezier transformations
3D synthesis: Extending the architecture to volumetric data

Performance Optimizations

JIT compilation: Already implemented (10-20% speedup available)
Mixed precision: fp16/bf16 training and inference
Quantization: 8-bit/4-bit inference for edge devices
Knowledge distillation: Bezier→fixed activation distillation for mobile deployment

Acknowledgments

FluxFlow was inspired by Kolmogorov-Arnold Networks (KAN) [Liu et al., 2024], extending learnable activation functions to generative models with dynamic parameter generation.

Special thanks to:

COCO 2017 [cocodataset.org] & Open Images [Google] - Mixed captions used for testing and validation
TTI-2M Dataset [HuggingFace] - 2M image-text pairs for large-scale training experiments
SPADE [Park et al., 2019] - Spatial conditioning mechanism
FiLM [Perez et al., 2018] - Feature-wise modulation

For complete references, see REFERENCES.md.

Citation

If you use FluxFlow in your research, please cite:

@software{fluxflow2024,
  title = {FluxFlow: Efficient Text-to-Image Generation with Bezier Activation Functions},
  author = {FluxFlow Contributors},
  year = {2025},
  note = {Inspired by Kolmogorov-Arnold Networks (KAN)},
  url = {https://github.com/danny-mio/fluxflow-core}
}

Key References:

@article{liu2024kan,
  title={KAN: Kolmogorov-Arnold Networks},
  author={Liu, Ziming and Wang, Yixuan and Vaidya, Sachin and others},
  journal={arXiv preprint arXiv:2404.19756},
  year={2024}
}

License

MIT License - see LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

danny-mio

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.8.1

Apr 3, 2026

0.8.0

Feb 21, 2026

0.7.0

Feb 11, 2026

0.5.0

Dec 23, 2025

0.4.0

Dec 17, 2025

0.3.1

Dec 13, 2025

0.2.1

Dec 9, 2025

0.1.1

Dec 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fluxflow-0.8.1.tar.gz (136.8 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fluxflow-0.8.1-py3-none-any.whl (127.5 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file fluxflow-0.8.1.tar.gz.

File metadata

Download URL: fluxflow-0.8.1.tar.gz
Upload date: Apr 3, 2026
Size: 136.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fluxflow-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`719ceca45d621781e0c88f4ab7b223810fd435441cf7880c75011e563f5dcadb`
MD5	`4ed36db83d7ab3ec58543f32b9530b48`
BLAKE2b-256	`fdda2c19f804021d5acae2ca1d615a8050de03cdbe55675473f5120dd93e5978`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fluxflow-0.8.1.tar.gz:

Publisher: ci.yml on danny-mio/fluxflow-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fluxflow-0.8.1.tar.gz
- Subject digest: 719ceca45d621781e0c88f4ab7b223810fd435441cf7880c75011e563f5dcadb
- Sigstore transparency entry: 1228837993
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: danny-mio/fluxflow-core@eebd1eedb61f13da2b359872c65b65c8b3c02ccb
- Branch / Tag: refs/tags/v0.8.1
- Owner: https://github.com/danny-mio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@eebd1eedb61f13da2b359872c65b65c8b3c02ccb
- Trigger Event: push

File details

Details for the file fluxflow-0.8.1-py3-none-any.whl.

File metadata

Download URL: fluxflow-0.8.1-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 127.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fluxflow-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`991003241833433d1ccac37baab324cca3d245d8fa336a8d5760b00425e789c7`
MD5	`5849f1b63b856d081f98bf624a67e235`
BLAKE2b-256	`eff0c80632c5c2173805ce83cba52f8c2e7d41cc39b05fc3215b0f908de3fb4f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fluxflow-0.8.1-py3-none-any.whl:

Publisher: ci.yml on danny-mio/fluxflow-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fluxflow-0.8.1-py3-none-any.whl
- Subject digest: 991003241833433d1ccac37baab324cca3d245d8fa336a8d5760b00425e789c7
- Sigstore transparency entry: 1228838013
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: danny-mio/fluxflow-core@eebd1eedb61f13da2b359872c65b65c8b3c02ccb
- Branch / Tag: refs/tags/v0.8.1
- Owner: https://github.com/danny-mio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@eebd1eedb61f13da2b359872c65b65c8b3c02ccb
- Trigger Event: push

fluxflow 0.8.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FluxFlow Core

🚧 Project Status

Core Philosophy

Bezier Activations: Three Approaches

1. Input-Based (BezierActivation) - Most Common

2. Trainable (TrainableBezier) - Specialized Layers

3. Pillar-Based - Transformer MLPs

Installation

Production Install

Development Install

System Requirements

Minimum Requirements

GPU Requirements (Optional but Recommended)

For Training

For Inference

CPU-Only Mode

Apple Silicon (MPS)

Dependency Notes

Key Features

Quick Start

High-Level API (Recommended)

Advanced Usage

Model Versions

Classifier-Free Guidance (CFG)

What is CFG?

Using CFG

Guidance Scale Guidelines

Low-Level API

Package Contents

Why Bezier Activations?

Mathematical Foundation

Performance Targets

Strategic Activation Placement

API Comparison

Model Architecture Overview

Technical Details

Bezier Activation Types

1. Input-Based BezierActivation

2. TrainableBezier

3. Pillar-Based

Current FluxFlow Configuration

Future Directions

Custom Text Encoder

Multimodal Extensions

Performance Optimizations

Links

Acknowledgments

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance