Skip to main content

An ML classifier model to make predictions from semi-structured data.

Project description

ORiGAMi - Object Representation through Generative Autoregressive Modelling

| ORiGAMi Paper on Arxiv |

Disclaimer

This is a personal fork of the original mongodb-labs/origami project. While I was the original author, I have since left MongoDB and am continuing development and maintenance of this fork independently.

This tool is not officially supported or endorsed by MongoDB, Inc. The code is released for use "AS IS" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool against critical production systems.

Overview

ORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.

Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.

ORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.

Installation

ORiGAMi requires Python 3.11. We recommend using uv for dependency management and virtual environments.

Install from PyPI

pip install origami-ml

Install from source with uv (recommended for development)

First, install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then clone and install the project:

git clone https://github.com/rueckstiess/origami.git
cd origami
uv sync --extra dev

This will automatically create a virtual environment, install Python 3.11 if needed, and install all dependencies.

To run commands in the uv environment:

uv run origami --help
uv run pytest

Usage

ORiGAMi comes with a command line interface (CLI) and a Python SDK.

Usage from the Command Line

The CLI allows to train a model and make predictions from a trained model. After installation, run origami from your shell to see an overview of available commands.

Help for specific commands is available with origami <command> --help, where <command> is currently one of train or predict. Note that the first time you run the origami CLI tool can take longer.

Detailed documentation for the CLI and available options can be found in CLI.md.

Usage with Python

To see an example on how to use ORiGAMi from Python, take a look at the provided ./notebooks folder, e.g. the example_origami_dungeons.ipynb notebook.

Experiment Reproduction

This code is released alongside our paper, which can be found on Arxiv: ORIGAMI: A generative transformer architecture for predictions from semi-structured data. To reproduce the experiments in the paper, see the instructions in the ./experiments/ directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

origami_ml-0.3.0.tar.gz (54.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

origami_ml-0.3.0-py3-none-any.whl (64.9 kB view details)

Uploaded Python 3

File details

Details for the file origami_ml-0.3.0.tar.gz.

File metadata

  • Download URL: origami_ml-0.3.0.tar.gz
  • Upload date:
  • Size: 54.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.11

File hashes

Hashes for origami_ml-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fe5f81233d671c66d7884afd0327cca81975c9dbec638e3f050cce1bb5e319e3
MD5 46f3983dbd3894e260530a40e987dba5
BLAKE2b-256 8431ca073e4fa8863ab8eb1e3cd5a7d3ea75d30b13e8e7ce06f927f3ec610bbf

See more details on using hashes here.

File details

Details for the file origami_ml-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: origami_ml-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.11

File hashes

Hashes for origami_ml-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b65ad0ff45764a79fd16c4387b2f3a443029b803bb7472d1c4cd3b02ce5a877
MD5 9e67ea3dcb75e757f7795be5af12cd74
BLAKE2b-256 5647aba88d1533b0d094c924a97ca899cc1424b0fcce58e2941bbef9f4a3d510

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page