Skip to main content

An ML classifier model to make predictions from semi-structured data.

Project description

ORiGAMi - Object Representation through Generative Autoregressive Modelling

Overview

ORiGAMi is a transformer-based Machine Learning model to directly process semi-structured data such as MongoDB documents or JSON files and make predictions from this data.

Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular form first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.

ORiGAMi is a transformer model and follows the trend of many other deep learning models by operating directly on the raw data and discovering meaningful features itself. Preprocessing is fully automated (apart from some hyper-parameters that can improve the model performance).

Installation

ORiGAMi requires Python version 3.10 or higher. We recommend using a virtual environment, such as Python's native venv.

To install ORiGAMi with pip, use

pip install origami-ml

You can also clone the repository to your local machine and install the dependencies manually:

git clone https://github.com/mongodb-labs/origami.git
cd origami
pip install -r requirements.txt
pip install -e .

Usage

ORiGAMi comes with a command line interface (CLI) and a Python SDK.

Usage from the Command Line

The CLI allows to train a model and make predictions from a trained model. After installation, run origami from your shell to see an overview of available commands.

Help for specific commands is available with origami <command> --help, where <command> is currently one of train or predict.

Detailed documentation for the CLI and available options can be found in CLI.md.

Usage with Python

To see an example on how to use ORiGAMi from Python, take a look at the provided ./notebooks folder, e.g. the example_origami_dungeons.ipynb notebook.

Experiment Reproduction

This code is released alongside our paper, which can be found on Arxiv: ORIGAMI: A generative transformer architecture for predictions from semi-structured data. To reproduce the experiments in the paper, see the instructions in the ./experiments/ directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

origami_ml-0.1.1.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

origami_ml-0.1.1-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file origami_ml-0.1.1.tar.gz.

File metadata

  • Download URL: origami_ml-0.1.1.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for origami_ml-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2784c53e6d7d045c7c943573d3587032d55e1dc32eb435510fd325c57ce33ac6
MD5 11348842281639d29a29e049e7278bc5
BLAKE2b-256 bf522c6e1cdd6d76a2e32a90bb083ea9cd3a0a088f679578b10344179522de2a

See more details on using hashes here.

File details

Details for the file origami_ml-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: origami_ml-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for origami_ml-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b735c7e8c45702e6e8ee8472c44906d273897c8f50137894516111c95034edd2
MD5 3661028a4a05803b9e698c113b1e1a04
BLAKE2b-256 70c5365cd1a33e1cb9fa266c9f55b35e8d094c482f8e67c82f6cbe851c7b2dac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page