Skip to main content

General Unified World Model — a typed causal ontology of civilization, built on canvas-engineering structured latent spaces.

Project description

general-unified-world-model

A typed causal ontology of civilization, built on canvas-engineering structured latent spaces.

PyPI Tests Tests Python 3.10+ License

Canvas engineering structures what a diffusion model thinks in. This repo declares a 857-field typed schema spanning planetary physics through individual psychology, compiles it onto a structured latent canvas, and trains it on heterogeneous real-world data — without throwing out samples that are missing fields.


Full World Model — 857 fields on 128x128 canvas

Full World Model — 857 fields allocated on a canvas. Each colored region is a semantic domain.


The idea

Every dataset in the world describes a slice of the same underlying reality. GDP data captures macroeconomic output. Market data captures prices. News captures narratives. Earnings calls capture firm strategy. But no single dataset captures everything.

Traditional approaches either:

  • (a) restrict to the intersection — throw out data missing any field
  • (b) impute missing values — introduce noise

General Unified World Model uses coarse-graining: you declare exactly which nodes you want to model. Non-included sub-types simply don't exist on the canvas — no positions, no attention, no loss. Each nested type that IS included automatically gets a coarse-grained field (a compressed representation at its path) that bottlenecks cross-level attention. A hedge fund including only financial.yield_curves and regime gets yield curve fields fully expanded and a coarse-grained field for the regime — with no credit, FX, or other sub-types consuming canvas space. The topology handles connectivity: fields only attend to what's actually on the canvas.

The key enabler is canvas-engineering — a type system for multimodal latent computation. Each field in the world model occupies specific positions on a 3D (T, H, W) canvas grid, with declared temporal frequency, loss weight, and connectivity. The topology is the compute graph.

Quick start

pip install general-unified-world-model

Compile the full world model

from canvas_engineering import compile_schema, ConnectivityPolicy
from general_unified_world_model import World

world = World()
bound = compile_schema(
    world,
    T=1, H=128, W=128, d_model=64,
    connectivity=ConnectivityPolicy(
        intra="dense",
        parent_child="hub_spoke",
    ),
)

print(f"{len(bound.field_names)} fields, "
      f"{bound.layout.num_positions} positions, "
      f"{len(bound.topology.connections)} connections")
# 857 fields, 16384 positions, 11735 connections

Project to a subset

You don't need the full 857-field model. Declare what you care about:

from general_unified_world_model import WorldProjection, project

# Hedge fund: macro + financial + two firms
proj = WorldProjection(
    include=[
        "financial",
        "country_us.macro",
        "regime",
        "forecasts.macro",
        "forecasts.financial",
    ],
    firms=["AAPL", "NVDA"],
)

bound = project(proj, T=1, H=64, W=64, d_model=64)
# ~200 fields, focused on what matters

Or just describe what you need

You don't have to construct projections by hand. Describe your modeling needs in plain English and let general-unified-world-model build the projection for you:

from general_unified_world_model import llm_project

result = llm_project(
    "I'm a hedge fund PM. I need to model US macro, rates, credit, "
    "and two firms: Apple and NVIDIA. I care about recession risk "
    "and the Fed's next move.",
    provider="anthropic",  # or "openai"
)

# The LLM selects the right fields automatically
bound = result.compile(T=1, H=64, W=64, d_model=64)
print(result.reasoning)
# "Hedge fund needs financial markets (yield curves, credit spreads,
#  equities), US macro (GDP, inflation, labor), regime indicators
#  for recession detection, and firm-level nodes for AAPL and NVDA..."

Train on heterogeneous data

from general_unified_world_model import (
    WorldProjection, project, build_world_model,
    FieldEncoder, FieldDecoder, MaskedCanvasTrainer,
    DatasetSpec, FieldMapping, build_mixed_dataloader,
)

# Two data sources with different field coverage
macro_spec = DatasetSpec(
    name="FRED",
    mappings=[
        FieldMapping("gdp", "country_us.macro.output.gdp_nowcast"),
        FieldMapping("cpi", "country_us.macro.inflation.headline_cpi"),
    ],
)
market_spec = DatasetSpec(
    name="Yahoo",
    mappings=[
        FieldMapping("vix", "financial.equities.vix"),
        FieldMapping("ust10y", "financial.yield_curves.ten_year"),
    ],
)

# Both train the same canvas — missing fields are masked, not imputed
loader = build_mixed_dataloader(
    bound,
    sources=[(macro_spec, macro_data), (market_spec, market_data)],
    batch_size=32,
)

The schema

19 layers, 857 fields, 8 temporal frequency classes:

Layer Fields Frequency What it captures
Planetary Physical Climate, infrastructure, disasters τ6–τ7 (annual–multi-year) Slow structural constraints
Resources & Energy Crude, metals, food, water, compute τ1–τ4 (hourly–monthly) Physical inputs to production
Global Financial Yields, credit, FX, equities, crypto τ0–τ2 (sub-minute–daily) High-bandwidth reflexive core
Macroeconomy GDP, inflation, labor, fiscal, trade, housing τ3–τ5 (weekly–quarterly) Real economy per country
Political Executive, legislative, judicial, geopolitical τ4–τ7 (monthly–multi-year) Governance structures
Narrative & Belief Media, elite consensus, public sentiment τ0–τ4 (sub-minute–monthly) Reflexivity layer
Technology AI, biotech, quantum, robotics, productivity τ5–τ7 (quarterly–multi-year) Long-run structural drivers
Demographics Population, dependency, urbanization τ7 (multi-year) Slowest structural force
Sector Demand, supply, margins, disruption risk τ3–τ5 (weekly–quarterly) Per GICS sector
Supply Chain Concentration, lead time, bottleneck severity τ2–τ4 (daily–monthly) Graph-structured nodes
Business Financials, operations, strategy, market, risk τ2–τ5 (daily–quarterly) Per firm (sparse)
Individual Cognitive, incentives, network, state τ2–τ5 (daily–quarterly) Key decision-makers (very sparse)
Event Tape News, social, filings, policy, conflict τ0–τ1 (sub-minute–hourly) Real-time event stream
Data Channel Trust Government, market, alternative, corporate τ3–τ7 Meta-epistemic calibration
Regime State Growth, inflation, financial cycle, fragility τ5–τ7 Compressed global latent
Intervention Monetary, fiscal, regulatory, military + effects τ2–τ5 Counterfactual analysis
Forecast Bundle Recession prob, credit stress, conflict risk output Structured prediction heads
Country Macro + politics + demographics per country composite Per major economy

Temporal frequency classes

τ0 = sub-minute   (period=1)      markets, breaking news
τ1 = hourly        (period=4)      grid load, commodities
τ2 = daily         (period=16)     commodity prices, port congestion
τ3 = weekly        (period=48)     claims, inventories, payroll
τ4 = monthly       (period=192)    CPI, PMI, company closes
τ5 = quarterly     (period=576)    earnings, GDP, capex
τ6 = annual        (period=2304)   demographics, infrastructure
τ7 = multi-year    (period=4608)   regime changes, tech diffusion

Use cases

CEO: "Model my company in context"

proj = WorldProjection(
    include=[
        "country_us.macro",
        "sector_tech",
        "financial.yield_curves",
        "financial.equities",
        "regime",
        "forecasts",
    ],
    firms=["ACME", "RIVAL"],
    individuals=["ceo", "cfo", "cto"],
)

CEO use case — causal interaction graph

Causal interaction graph: regime conditions macro and financial context, which flows into firm dynamics. Executive decisions influence the firm. Competitive dynamics (dashed) between ACME and RIVAL. Forecasts are the structured output.

Government: "Model policy impact"

proj = WorldProjection(
    include=[
        "country_us",
        "country_cn.macro",
        "country_eu.macro",
        "financial",
        "interventions",
        "forecasts",
        "regime",
    ],
    countries=["jp", "uk"],
)

Government use case — causal interaction graph

Policy transmission graph: regime state conditions all economies. Bilateral trade and financial linkages connect countries (bidirectional arrows). Interventions propagate through the financial system and the domestic economy to produce structured forecasts.

Computer use agent: "Model the user's world"

proj = WorldProjection(
    include=[
        "events",
        "regime.compressed_world_state",
        "forecasts.macro.recession_prob_3m",
    ],
    individuals=["user"],
    firms=["user_org"],
)

Agent use case — causal interaction graph

Minimal world context for an agent: real-time events feed into the user and organization models. Regime state drives recession forecasts. User and organization are bidirectionally linked.

Training architecture

The training curriculum is a DAG (directed acyclic graph) where fork nodes train domain-specific models in parallel and join nodes merge them by weight averaging. The topology is driven by semantic distance between domains.

Declare a curriculum in YAML

Canvas dimensions are auto-computed from the projected fields — like a C compiler sizing a struct. Only specify n_layers, n_steps, and datasets per subject; the grid size adapts to fit whatever fields the natural language description resolves to.

# curricula/standard.yaml
name: my_world_model
defaults:
  d_model: 64
  n_steps: 5000

stages:
  - name: foundations
    parallel:
      - subject: "Core financial markets: yield curves, credit, equities"
        datasets: [yahoo_finance, fred_rates]
      - subject: "US macroeconomic fundamentals: GDP, inflation, employment"
        datasets: [fred_macro]
      - subject: "Natural resources and commodity supply chains"
        datasets: [yahoo_commodities]

  - name: cross_domain
    builds_on: foundations
    parallel:
      - subject: "How macro conditions drive financial markets"
        datasets: [fred_macro, yahoo_finance]

  - name: integration
    builds_on: cross_domain
    parallel:
      - subject: "Full world model"
        include: ["*"]
        n_steps: 10000

Natural language curriculum resolution

Each subject string is resolved to world model field paths via keyword matching — no LLM required. This works for both projection (selecting what to model) and training (defining what to learn):

from general_unified_world_model.training.dag_curriculum import (
    CurriculumSpec, resolve_subject,
)

# Resolve a description to field paths
resolve_subject("How inflation drives yield curves and credit spreads")
# → ['country_us.macro.inflation', 'financial.yield_curves', 'financial.credit', 'regime']

# Load a full curriculum and train
spec = CurriculumSpec.from_yaml("curricula/standard.yaml")
nodes = spec.to_training_nodes()  # → 12 TrainingNode DAG

# Or define inline — describe what you care about, get a trained model
spec = CurriculumSpec(stages=[
    CurriculumStage(name="my_domain", parallel=[
        CurriculumSubject(
            subject="How semiconductor supply chains affect tech stock valuations",
            datasets=["yahoo_finance"],
            firms=["NVDA", "TSMC"],
        ),
    ]),
])

The builds_on field defines the DAG: all subjects in a stage inherit merged weights from the previous stage. Each subject's canvas is auto-sized from its resolved fields. The keyword matcher covers 60+ terms across all domains (financial, macro, political, resources, tech, narratives, climate, health, etc.).

The 4-tier standard curriculum

  1. Foundation (6 parallel nodes): Financial, macro, politics, resources, tech, narratives — each trained independently on auto-sized canvases
  2. Cross-domain (3 parallel nodes): Macro→finance, geopolitics→commodities, narratives→markets — merging pretrained parent weights
  3. Complex (2 parallel nodes): Corporate strategy, policy impact — multi-parent joins
  4. Integration (1 node): Full world model, all domains, all cross-domain connections active

Weight transfer at join points

At each join, parent backbones are averaged by parameter name. Field-specific encoders/decoders transfer via matching field names — this works because the ontology is stable across projections.

Why this works

The semantic type system lets us proxy generalization distance between any two modalities by their semantic embedding distance. GDP growth and industrial production are semantically close — their latent dynamics will be correlated. GDP growth and seismic risk are semantically far — nearly independent. This guides curriculum design: couple close domains first, distant later.

Distributed topology — not a bottleneck

The architecture preserves the real world's distributed interaction structure. Domains connect directly to each other (financial ↔ country, firm ↔ sector, events → markets) via cross-domain attention connections. The regime latent is one peer among many — it influences 5 domains but does NOT sit on all information pathways. 98% of attention is intra-domain (dense), 2% is cross-domain (sparse, multi-target). There is no information bottleneck.

Heterogeneous data training

Each dataset maps its columns to whatever fields exist on the canvas. Loss is computed only on positions that have ground truth — fields without data in the current batch get no gradient. The topology handles what connects to what.

Dataset A (FRED):     GDP ✓  CPI ✓  (only macro fields on canvas)
Dataset B (Yahoo):    VIX ✓  Yields ✓  (only financial fields on canvas)
Dataset C (News):     Embeddings ✓  (only narrative fields on canvas)

Different projections can share backbone weights because the backbone operates on semantically-conditioned positions. A GDP-trained backbone transfers to a yields projection because the transformer's attention weights are parameterized by semantics, not positions.

Data adapters

Built-in adapters and collectors for 7+ data sources:

Source Type API Key Coverage
FRED Collector Required 42 macro/financial series
Yahoo Finance Collector None Equities, FX, commodities, crypto
World Bank Collector None 10 indicators × 7 countries
IMF Collector None WEO forecasts + commodity prices
BIS Collector None Credit, property, FX, debt
NOAA Climate Collector Optional Temperature, CO2, sea level
HuggingFace Adapter None Auto-mapped from any HF dataset
Synthetic Collector None 57 correlated fields for testing
from general_unified_world_model.data import fred_adapter, yahoo_finance_adapter

# FRED: 42 macro series mapped to world model fields
fred_spec, fred_data = fred_adapter(api_key="...", start_date="2010-01-01")

# Yahoo Finance: equities, FX, commodities, crypto
yahoo_spec, yahoo_data = yahoo_finance_adapter(
    include_equity=True, include_fx=True,
    firm_tickers={"AAPL": "firm_AAPL"},
)

# Collect all available data at once
from general_unified_world_model.data import collect_all
sources = collect_all(api_keys={"fred": "..."})

Auto-map any HuggingFace dataset

from general_unified_world_model.data import hf_adapter, hf_inspect

# Preview what would be mapped
info = hf_inspect("fred-economic-data/FRED-MD")
print(f"{info['mapped_count']}/{info['n_columns']} columns auto-mapped")
print(info['unmapped_columns'])  # columns that need manual overrides

# Load and auto-map
spec, data = hf_adapter("fred-economic-data/FRED-MD")

# With manual overrides for ambiguous columns
spec, data = hf_adapter(
    "some-org/custom-dataset",
    column_overrides={"gdp_yoy": "country_us.macro.output.gdp_nowcast"},
    transform_overrides={"gdp_yoy": "z_score"},
)

The auto-mapper inspects column names, dataset tags, and description to infer field paths. Unmapped columns are logged — use hf_inspect() to preview before committing.

Generic CSV/Parquet

from general_unified_world_model.data import tabular_adapter
spec, data = tabular_adapter(
    "My Dataset", "data.csv",
    column_mappings={"gdp_growth": "country_us.macro.output.gdp_nowcast"},
    transforms={"gdp_growth": "z_score"},
)

Temporal entities

Entities can appear and disappear over time:

from general_unified_world_model import TemporalTopology
from general_unified_world_model.schema.business import Business

tt = TemporalTopology()
tt.add("firm_AAPL", Business(), start_tick=100)    # founded
tt.add("firm_ENRON", Business(), start_tick=0, end_tick=500)  # dissolved

# At tick 50: ENRON exists, AAPL doesn't yet
active = tt.active_at(50)

# Generate attention mask that blocks inactive entities
mask = tt.generate_temporal_attention_mask((0, 1000), bound_schema)

Inference

from general_unified_world_model import WorldModel

model = WorldModel.load("checkpoint.pt", projection)

# Observe what you know
model.observe("financial.yield_curves.ten_year", 4.25)
model.observe("country_us.macro.inflation.headline_cpi", 3.1)
model.observe("financial.equities.vix", 18.5)

# Predict everything else
predictions = model.predict(n_steps=50)

recession_prob = predictions["forecasts.macro.recession_prob_3m"]
regime = predictions["regime.growth_regime"]
credit_stress = predictions["forecasts.financial.credit_stress_3m"]

Visualizations

The rendering system provides multiple views into the same world model state. Install the viz extra for rendering support: pip install general-unified-world-model[viz]

Canvas heatmaps

Each field occupies a contiguous region on the (H, W) canvas. Colors indicate semantic domain; intensity shows state magnitude.

Macro Model Projection Hedge Fund Projection

Left: Macro model projection (~40 fields, auto-sized canvas). Right: Hedge fund projection with AAPL+NVDA (~200 fields).

Domain topology graphs

Nodes are semantic domains, edges show attention connectivity between them. Node size ∝ field count, edge width ∝ connection density.

Macro Model Topology Hedge Fund Model Topology

Left: A macroeconomic model's domain graph — macro, rates, credit, and regime are tightly coupled. Right: A hedge fund model adds firm-level nodes and cross-domain positioning.

These topology graphs show how different projections create different compute graphs. The macro model has a tight cluster around rates/credit/macro. The hedge fund model fans out to include firm-level nodes (AAPL, NVDA) with edges to financial and macro domains.

Financial charts

Time series views of world model fields, auto-generated or from real observations.

Financial Charts

Geopolitical state map

Each country's latent state vector is projected to RGB via PCA — the color is a 3D projection of the full state representation, not a scalar risk score. Real country boundaries rendered on orthographic globes with cartopy.

Rotating Geopolitical Globe

Geopolitical Dual-Hemisphere Map

Regime dashboard

Horizontal bars for the 17 regime state fields — value magnitude, no decoration. The compressed world state latent strip at the bottom.

Regime Dashboard

Social graph (CEO perspective)

First-person entity network. Focal entity centered, others positioned by connection strength. Edge weight and color encode relationship intensity (topology-derived + structurally inferred). Field count shown inside each node.

CEO Social Graph

Rendering API

from general_unified_world_model import render

# By renderer name
fig = render(bound, "canvas_heatmap")
fig = render(bound, "topology_graph")
fig = render(bound, "financial_chart")

# Save directly
render(bound, "canvas_heatmap", save_path="output.png")

# Or use renderer classes directly
from general_unified_world_model.rendering import (
    CanvasHeatmapRenderer, TopologyGraphRenderer, CausalGraphRenderer,
    FinancialChartRenderer, GeopoliticalMapRenderer,
    RegimeDashboardRenderer, SocialGraphRenderer,
    RenderContext,
    render_ceo_use_case, render_government_use_case, render_agent_use_case,
)

ctx = RenderContext(bound_schema=bound, title="My Model")
renderer = CanvasHeatmapRenderer()
fig = renderer.render(ctx)
renderer.save(ctx, "output.png", dpi=200)

LLM-powered projection builder

Don't want to manually specify field paths? Describe your modeling needs in plain English and let an LLM design the projection for you.

from general_unified_world_model import llm_project

result = llm_project(
    "I'm a hedge fund PM. I need to model US macro, rates, credit, "
    "and two firms: Apple and NVIDIA. I care about recession risk "
    "and the Fed's next move.",
    provider="anthropic",  # or "openai"
    api_key="sk-ant-...",  # or set ANTHROPIC_API_KEY env var
)

# Result contains the designed projection + reasoning
print(result.reasoning)
# "Hedge fund needs financial markets, US macro, regime indicators..."

# Compile to a BoundSchema
bound = result.compile(T=1, H=64, W=64, d_model=64)
print(f"{len(bound.field_names)} fields selected")

Uses raw HTTP calls — no SDK dependencies. Supports both Anthropic and OpenAI providers.

Installation

# Core
pip install general-unified-world-model

# With real data adapters
pip install general-unified-world-model[data]

# With training infrastructure
pip install general-unified-world-model[train]

# Everything
pip install general-unified-world-model[all]

Requires Python 3.10+ and PyTorch 2.0+.

Examples

examples/
├── 01_quickstart.py           # Compile full world model, inspect fields
├── 02_ceo_company_model.py    # CEO use case: company + context
├── 03_government_policy.py    # Government: policy impact analysis
├── 04_computer_use_agent.py   # Agent: user psychology + world context
├── 05_train_financial.py      # Train on real FRED + Yahoo data
└── 06_curriculum_training.py  # Full 3-phase curriculum training

Development

git clone https://github.com/JacobFV/general-unified-world-modeling.git
cd general-unified-world-modeling
pip install -e ".[dev]"
pytest

Branch structure

  • develop — active development, PRs target here
  • release — stable releases, tagged commits trigger PyPI publish

Running tests

# Full suite (144 tests)
pytest

# With coverage
pytest --cov=general_unified_world_model --cov-report=term-missing

# Specific module
pytest tests/test_schema.py -v

Project layout

src/general_unified_world_model/
├── schema/           # 19 schema modules (physical → forecast)
│   ├── world.py      # Top-level World composition (857 fields)
│   ├── physical.py   # Planetary physical substrate
│   ├── resources.py  # Energy, metals, food, water, compute
│   ├── financial.py  # Global monetary & financial
│   ├── macro.py      # Macroeconomy (per country)
│   ├── political.py  # Political & institutional
│   ├── narrative.py  # Narrative, belief & expectations
│   ├── technology.py # Technology & innovation
│   ├── demographics.py
│   ├── sector.py     # Per GICS sector
│   ├── supply_chain.py
│   ├── business.py   # Per firm (sparse)
│   ├── individual.py # Key decision-makers (very sparse)
│   ├── events.py     # Real-time event tape
│   ├── trust.py      # Data channel trust (meta-epistemic)
│   ├── regime.py     # Privileged regime latent
│   ├── intervention.py
│   ├── forecast.py   # Structured output heads
│   ├── country.py    # Composite per country
│   └── observability.py  # Reusable epistemic bundles
├── projection/       # Subsetting & connectivity
│   ├── subset.py     # WorldProjection, project()
│   ├── temporal.py   # Temporal entity management
│   └── transfer.py   # Semantic transfer distance
├── training/         # Training infrastructure
│   ├── backbone.py       # Transformer backbone
│   ├── heterogeneous.py  # Masked canvas trainer
│   ├── diffusion.py      # Diffusion objective
│   ├── curriculum.py     # Phase-based curriculum
│   └── dag_curriculum.py # DAG curriculum + YAML + NL spec
├── data/             # Data adapters & collectors
│   ├── adapters.py   # FRED, Yahoo, PMI, earnings, news, CSV
│   ├── collectors.py # FRED, Yahoo, WorldBank, IMF, BIS, NOAA, Synthetic
│   └── huggingface.py # Auto-map any HuggingFace dataset
├── rendering/        # Visualization system
│   ├── base.py       # Renderer protocol, RenderContext, registry
│   ├── canvas.py     # Canvas heatmap (field allocation view)
│   ├── topology.py   # Domain topology graph
│   ├── financial.py  # Financial time series charts
│   ├── geopolitical.py  # Globe map + rotating GIF
│   ├── regime.py     # Regime state dashboard
│   └── social.py     # Social/entity network graph
├── llm/              # LLM-powered projection builder
│   └── projection_builder.py  # Natural language → WorldProjection
└── inference.py      # Observe/predict API

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

general_unified_world_model-0.0.3.tar.gz (150.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

general_unified_world_model-0.0.3-py3-none-any.whl (143.4 kB view details)

Uploaded Python 3

File details

Details for the file general_unified_world_model-0.0.3.tar.gz.

File metadata

File hashes

Hashes for general_unified_world_model-0.0.3.tar.gz
Algorithm Hash digest
SHA256 454152718b9e27145ff0227f968f2219b463d0c439f6787944c6d2a2952ac22b
MD5 d2d8ca98055f5f40bf6dd36b2a793ea0
BLAKE2b-256 bc7a75fa5dd72257f0fd063bf54e612247d23951b9cfc024943fdc0c673b6280

See more details on using hashes here.

File details

Details for the file general_unified_world_model-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for general_unified_world_model-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 72456ed96f6aa09b22c71ea5e17b25a40cfb04e574ab2fdfe8faba8d9f83952b
MD5 c7a9a9a114f322d2beb298a36fa56d84
BLAKE2b-256 d56b5ace04f05f6642ac414a74b99a575311cd338724bac14a70ce30750c961f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page