Manifold: Geometric Intelligence via Symplectic Geodesic Flows | GFN: Geometric Flow Networks
Project description
GFN | Manifold
Geodesic Flow Networks: Geometric Intelligence via Symplectic Flows
What's New in v2.5.0 (Riemannian Stability)
- Riemannian Optimization:
RiemannianAdamoptimizer ensures parameter updates respect manifold geometry - Adaptive Curvature Gating: Learnable valve mechanism enables inertial coasting when optimal
- Zero-Force Inductive Bias: Architectural enforcement of
E(0) = 0for perfect state preservation - Velocity Normalization: Automatic stabilization preserving memory direction while controlling magnitude
Overview
GFN (Geodesic Flow Networks), publicly known as Manifold, reformulates sequence modeling as geodesic flow on a learned Riemannian manifold. Instead of attention matrices (O(N²)) or fixed-state compression, GFN models the latent state as a physical particle governed by symplectic integrators, enabling O(1) memory with infinite horizon stability.
Core Innovation: State transitions follow Einstein's geodesic equation with learned curvature, ensuring information conservation via Hamiltonian dynamics.
Installation
pip install gfn
Or install from source:
git clone https://github.com/Manifold-Laboratory/manifold.git
cd manifold
pip install -e "."
Requirements: Python 3.10+, PyTorch 2.0+, CUDA (optional)
Quick Start
from src.model import Manifold
from src.optim import RiemannianAdam
# Model
model = Manifold(
vocab_size=50257,
dim=512,
depth=12,
heads=8,
integrator_type='leapfrog'
).cuda()
# Optimizer (REQUIRED: standard Adam will fail)
optimizer = RiemannianAdam(
model.parameters(),
lr=1e-4,
retraction='normalize',
max_norm=10.0
)
# Training
for x, y in dataloader:
optimizer.zero_grad()
logits, _, _ = model(x)
loss = criterion(logits.view(-1, vocab_size), y.view(-1))
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.05)
optimizer.step()
Verified Performance
Binary Parity Task (Cumulative XOR)
Challenge: Predict cumulative XOR over arbitrarily long sequences (requires infinite-precision state tracking)
Training Performance
| Model | Steps to Convergence | Final Loss | Training Time | Final Accuracy |
|---|---|---|---|---|
| GFN | 728 | 0.00494 | 47 min (L=20) | 99.9% |
| MicroGPT | 4,000 | 0.0254 | 1m 27s (L=20) | 99.0% |
GFN achieves lower loss (0.00494 vs 0.0254) and higher accuracy despite longer training time
Zero-Shot Generalization Results
Trained on L=20 only, tested on sequences up to L=1000 (50× longer):
Figure: Left plot shows perfect accuracy generalization. Right plot demonstrates O(1) memory scaling (flat line) vs theoretical O(N) baseline.
Detailed Results:
| Test Length | GFN Accuracy | GFN VRAM | MicroGPT Accuracy | MicroGPT VRAM |
|---|---|---|---|---|
| 20 (seen) | 100.0% | 28.3 MB | 98.0% | 44.7 MB |
| 50 | 100.0% | 28.4 MB | 49.5% (collapsed) | 73.7 MB |
| 100 | 100.0% | 28.6 MB | 50.1% (random) | 156.0 MB |
| 200 | 100.0% | 29.0 MB | 51.8% (random) | 420.9 MB |
| 400 | 100.0% | 29.8 MB | 49.9% (random) | 1,363 MB (1.3GB) |
| 500 | 100.0% | 30.4 MB | 49.1% (random) | 2,040 MB (2.0GB) |
| 1000 | 100.0% | 32.1 MB | 50.7% (random) | 7,488 MB (7.3GB) |
| 10000 | 100.0% | 60.3 MB | FAILED (OOM) | > 8GB |
Key Findings:
- ✅ Perfect Generalization: 100% accuracy on all lengths including L=10,000 (500× training length)
- ✅ O(1) Memory Verified: VRAM growth of only 32 MB (113%) from L=20→10,000
- ✅ Transformer Collapse: MicroGPT accuracy drops to random chance (50%) immediately at L=50
- ✅ Memory Advantage: At L=1000, GFN uses 32MB vs Transformer's 7.5GB (234× less memory)
Full benchmark results and plots available in tests/benchmarks/results/gfn_superiority/
Core Architecture
Geodesic Equation
d²x/dτ² + Γ(v, x) = F_ext(token)
- x: Position in semantic manifold
- v: Velocity (momentum/memory)
- Γ: Christoffel symbols (learned curvature)
- F: Input token force
Symplectic Integration (Leapfrog)
# Half-step velocity
v_half = v + 0.5 * dt * (F - Γ(v, x))
# Full-step position
x_next = x + dt * v_half
# Half-step velocity finalization
v_next = v_half + 0.5 * dt * (F - Γ(v_half, x_next))
# Stabilization
v_next = v_next / (||v_next|| + ε)
Properties:
- Time-reversible
- Volume-preserving (det(J) = 1)
- Energy-conserving (|ΔH| ≈ O(dt²))
Comparison with Baselines
| Architecture | Memory (Inference) | Compute (per token) | Gradient Stability | Verified |
|---|---|---|---|---|
| Transformer | O(N) KV cache | O(N·d) attention | Good | — |
| LSTM/GRU | O(1) | O(d²) gates | Poor | — |
| Mamba (SSM) | O(1) | O(d²) state update | Medium | — |
| GFN | O(1) state | O(d²·R) Christoffel | Excellent | ✓ |
Where N = sequence length, d = hidden dim, R = Christoffel rank (typically 16-32)
Note: GFN's O(d²·R) per-token cost is comparable to LSTMs/Mamba. For training full sequences, all architectures are O(N·...) in sequence length.
Documentation
- SCIENTIFIC_PAPER.md - Complete research paper with mathematical derivations
- API.md - Python API reference
- TRAINING.md - Training guide and best practices
- ARCHITECTURE.md - System design and components
- BENCHMARKS.md - Empirical performance validation
- PHYSICS.md - Mathematical foundations
Use Cases
- Long-Context Reasoning: Process sequences >10K tokens with constant memory
- Algorithmic Tasks: Perfect extrapolation on logical reasoning (XOR, sorting, arithmetic)
- Edge Deployment: Run large models on memory-constrained devices (<4GB RAM)
- Scientific Computing: Model systems requiring conservation laws (physics simulations)
Repository Structure
/
├── src/ # Core Implementation
│ ├── model.py # Main Manifold Architecture
│ ├── geometry.py # Christoffel Symbols & Integrators
│ ├── layers.py # M-Layer (Manifold Layer)
│ ├── embeddings.py # Functional Embeddings
│ └── optim.py # RiemannianAdam Optimizer
├── docs/ # Technical Documentation
│ ├── SCIENTIFIC_PAPER.md
│ ├── API.md
│ ├── TRAINING.md
│ └── BENCHMARKS.md
├── tests/ # Verification Suite
│ └── benchmarks/ # Reproducible Benchmarks
└── LICENSE # Apache 2.0
Development Status
Version 2.5.0 is production-ready for research and experimentation.
Verified:
- ✅ O(1) memory scaling (empirically confirmed)
- ✅ Perfect generalization on Parity task
- ✅ Stable training with RiemannianAdam
- ✅ Symplectic gradient flow
In Development:
- CUDA kernel acceleration (10-50× speedup expected)
- Mixed precision training (FP16/BF16)
- Language modeling benchmarks (WikiText)
- Mixture of Manifolds (MoM) architecture
Citation
If you use GFN in your research, please cite:
@software{gfn2026,
title={GFN: Geodesic Flow Networks},
author={Stürtz Joaquín},
year={2026},
version={2.5.0},
url={https://github.com/Manifold-Laboratory/manifold},
license={Apache-2.0}
}
Contributing
See CONTRIBUTING.md for guidelines.
Quick Links:
License
Apache License 2.0 - See LICENSE for details.
Geometric intelligence through physical principles.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gfn-2.5.0.tar.gz.
File metadata
- Download URL: gfn-2.5.0.tar.gz
- Upload date:
- Size: 53.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64edf05e2350fc2a6bddd7aff9904dbe10ad0a6946f85a4da51ff0721e9e9dbd
|
|
| MD5 |
fde17dbdfac2e3b72fcd709ad745f8b6
|
|
| BLAKE2b-256 |
912834628e45fa0c1c5673844522761b5b045903fbca6a88f291aa7e19d3fd06
|
File details
Details for the file gfn-2.5.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: gfn-2.5.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 161.4 kB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0d5958838c54347dc8bc1311797abefab035fdf2f7455cd58f3bdf8e8540d77
|
|
| MD5 |
c583b644c0ac828df4eefce95edd78e1
|
|
| BLAKE2b-256 |
f0e8ba7300563fb201a43ccbd6cba85a7805d0406a38b10126844835523bee98
|