An ML classifier model to make predictions from semi-structured data.
Project description
ORiGAMi - Object Representation through Generative Autoregressive Modelling
Overview
ORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.
Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.
ORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.
Installation
ORiGAMi requires Python version 3.10 or higher. We recommend using a virtual environment, such as
Python's native venv.
To install ORiGAMi with pip, use
pip install origami-ml
You can also clone the repository to your local machine and install the dependencies manually:
git clone https://github.com/mongodb-labs/origami.git
cd origami
pip install -r requirements.txt
pip install -e .
Usage
ORiGAMi comes with a command line interface (CLI) and a Python SDK.
Usage from the Command Line
The CLI allows to train a model and make predictions from a trained model. After installation, run origami from your shell to see an overview of available commands.
Help for specific commands is available with origami <command> --help, where <command> is currently one of train or predict.
Detailed documentation for the CLI and available options can be found in CLI.md.
Usage with Python
To see an example on how to use ORiGAMi from Python, take a look at the provided ./notebooks folder, e.g. the example_origami_dungeons.ipynb notebook.
Experiment Reproduction
This code is released alongside our paper, which can be found on Arxiv: ORIGAMI: A generative transformer architecture for predictions from semi-structured data. To reproduce the experiments in the paper, see the instructions in the ./experiments/ directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file origami_ml-0.1.2.tar.gz.
File metadata
- Download URL: origami_ml-0.1.2.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50416783a9a6bc300a42e184e3a7872dc67f862188eaec6a2c4fbdb66c6638e4
|
|
| MD5 |
5dbecde1a9cfd14b90798ac19411df92
|
|
| BLAKE2b-256 |
eb9e45d7a7be4c6ee74ed93f1fa94dc2bbc5b6438337af69984ff86a9e7c7ce7
|
File details
Details for the file origami_ml-0.1.2-py3-none-any.whl.
File metadata
- Download URL: origami_ml-0.1.2-py3-none-any.whl
- Upload date:
- Size: 53.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc637c94324788fbc6d1d5e5b70ff163b5cfce454e74135f01c847fa246ca2cd
|
|
| MD5 |
c5997cdbdb7295373d271ccc36341431
|
|
| BLAKE2b-256 |
1b1cefc3cef4b8116282ec010965c41f6c9d92b174bf509e39e5c8de60c2999d
|