ORIGAMI: Object RepresentatIon via Generative Autoregressive ModellIng

These details have not been verified by PyPI

Project description

Origami

A machine learning model for JSON data.

Origami trains models that learn the relationships between fields in JSON objects. Given a dataset of JSON records, an Origami model can:

Predict missing field values based on the other fields
Generate new synthetic JSON objects that follow the patterns in your data
Embed JSON objects as dense vectors for similarity search or downstream tasks

Unlike tabular ML models that require flat feature vectors, Origami works directly with JSON structure — including nested objects and arrays.

Installation

pip install origami-ml

Or with uv:

uv add origami-ml

Origami v2 is a breaking rewrite of the original origami-ml package. See Migrating from v1 to v2 if you are upgrading from v1.

Requires Python 3.11+. PyTorch is installed automatically. GPU acceleration (CUDA, Apple Silicon MPS) is auto-detected — no configuration needed.

Quick Start

from origami import OrigamiPipeline

# Your data: a list of JSON objects (Python dicts)
data = [
    {"product": "Wireless Headphones", "categories": ["audio", "wireless"], "price": 79.99,  "rating": 4.2},
    {"product": "USB-C Hub",           "categories": ["accessories"],       "price": 34.99,  "rating": 4.5},
    {"product": "Mechanical Keyboard", "categories": ["peripherals"],       "price": 129.99, "rating": 4.7},
    # ... more records
]

# Train with default settings
pipeline = OrigamiPipeline()
pipeline.fit(data, epochs=20)

# Predict a missing value (including arrays)
prediction = pipeline.predict(
    {"product": "Bluetooth Speaker", "categories": None},
    target_key="categories",
    allow_complex_values=True,
)
print(prediction)  # ["audio", "wireless"]

# Generate new synthetic records
samples = pipeline.generate(num_samples=5, temperature=0.8)

# Get a vector embedding
embedding = pipeline.embed({"product": "Wireless Headphones", "categories": ["audio", "wireless"]})
# numpy array of shape (128,)

# Save and load
pipeline.save("model.pt")
loaded = OrigamiPipeline.load("model.pt")

Configuration

For more control, pass an OrigamiConfig with nested configuration objects:

from origami import OrigamiPipeline, OrigamiConfig, ModelConfig, TrainingConfig, DataConfig

config = OrigamiConfig(
    model=ModelConfig(
        d_model=256,       # Larger hidden dimension (default: 128)
        n_layers=6,        # More transformer layers (default: 4)
    ),
    training=TrainingConfig(
        batch_size=64,
        num_epochs=50,
        target_key="price",                  # Track prediction metrics during training
        eval_metrics={"acc": "accuracy"},     # Compute accuracy each epoch
    ),
    data=DataConfig(
        numeric_mode="scale",  # Handle numeric fields as continuous values
    ),
)

pipeline = OrigamiPipeline(config)
pipeline.fit(train_data, eval_data=val_data)

ModelConfig controls the model architecture (size, depth, position encoding strategy)
TrainingConfig controls training hyperparameters (learning rate, batch size, evaluation)
DataConfig controls data preprocessing (how numeric fields are handled, vocabulary size)
InferenceConfig controls inference-time constraints (grammar and schema enforcement)

See Configuration Reference for all options.

Command-Line Interface

Origami includes a CLI for training, prediction, generation, evaluation, and embedding:

# Train a model
origami train -d data.jsonl -t label -e 20 -o model.pt

# Predict missing values
origami predict -m model.pt -d test.jsonl -t label

# Generate synthetic data
origami generate -m model.pt -n 100 --temp 0.8

# Evaluate model performance
origami evaluate -m model.pt -d test.jsonl -t label --metrics accuracy

See CLI Reference for all commands and options.

Documentation

Concepts — How Origami works: tokenization, position encoding, grammar constraints
Python SDK — Complete API reference for OrigamiPipeline
CLI Reference — All commands, options, and supported data formats
Configuration — Every configuration parameter explained
Migrating from v1 to v2 — Breaking changes and migration guidance

How It Works

Origami converts each JSON object into a sequence of tokens that preserve the hierarchical structure — keys, values, nesting, and arrays are all explicitly represented. Instead of encoding token position as a simple index (1st, 2nd, 3rd...), Origami uses Key-Value Position Encoding (KVPE), which encodes the path through the JSON tree. This lets the model understand which key each value belongs to, regardless of key order.

A grammar constraint system (a pushdown automaton) ensures that every model output is valid JSON — no syntax errors, ever. This is applied automatically with no configuration needed.

For numeric fields with many distinct values (like prices or measurements), Origami can model them as continuous distributions rather than discrete tokens, using a mixture density output head.

For a deeper explanation of these concepts, see the Concepts page.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.1

May 6, 2026

2.0.0a10 pre-release

Feb 16, 2026

2.0.0a9 pre-release

Feb 12, 2026

2.0.0a8 pre-release

Feb 8, 2026

2.0.0a7 pre-release

Feb 7, 2026

2.0.0a6 pre-release

Feb 5, 2026

2.0.0a5 pre-release

Feb 3, 2026

2.0.0a4 pre-release

Jan 30, 2026

2.0.0a3 pre-release

Jan 26, 2026

2.0.0a2 pre-release

Jan 24, 2026

2.0.0a1 pre-release

Jan 24, 2026

0.3.0

Oct 31, 2025

0.2.0

Oct 23, 2025

0.1.4

Feb 12, 2025

0.1.3

Feb 10, 2025

0.1.2

Feb 10, 2025

0.1.1

Feb 10, 2025

0.1.0

Feb 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

origami_ml-2.0.1.tar.gz (1.4 MB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

origami_ml-2.0.1-py3-none-any.whl (169.0 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file origami_ml-2.0.1.tar.gz.

File metadata

Download URL: origami_ml-2.0.1.tar.gz
Upload date: May 6, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.11

File hashes

Hashes for origami_ml-2.0.1.tar.gz
Algorithm	Hash digest
SHA256	`b16d07a5a19c323cff7e135600bdeee4b11548a35c270e14d52677d57afa6538`
MD5	`f59f2fa1f91cae248ceae700f3a9fa24`
BLAKE2b-256	`5e93f524fd5f33138499d70c81e53c473dc3396e8d3062c75c5245dd1ec481a9`

See more details on using hashes here.

File details

Details for the file origami_ml-2.0.1-py3-none-any.whl.

File metadata

Download URL: origami_ml-2.0.1-py3-none-any.whl
Upload date: May 6, 2026
Size: 169.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.11

File hashes

Hashes for origami_ml-2.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe8af7c2bb16af375923da8603affa9b91d338f61c8119868685a0bb8b18a1e7`
MD5	`6cb1c7b606848d31176e3a04b2d94b70`
BLAKE2b-256	`39b0abd880a4f89f384d7f80d82dad30e27d4725fb7c99d608bf6e415530ea54`

See more details on using hashes here.

origami-ml 2.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Origami

Installation

Quick Start

Configuration

Command-Line Interface

Documentation

How It Works

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes