Skip to main content

AutonML : CMU's AutoML System

Project description

CMU TA2 (Built using DARPA D3M ecosystem)

Auton ML is an automated machine learning system developed by CMU Auton Lab to power data scientists with efficient model discovery and advanced data analytics. Auton ML also powers the D3M Subject Matter Expert (SME) User Interfaces such as Two Ravens http://2ra.vn/.

Taking your machine learning capacity to the nth power.

We provide a documentation listing the complete set of tasks, data modalities, machine learning models and future supported tasks provided by AutonML here.

Installation

AutonML can be installed as: pip install autonml. We recommend this installation be done in a new virtual environment or conda environment.

Recommended steps to install autonml:

pip install autonml
pip install d3m-common-primitives d3m-sklearn-wrap sri-d3m rpi-d3m-primitives dsbox-primitives dsbox-corex distil-primitives d3m-esrnn d3m-nbeats 
pip install kf-d3m-primitives

This installation may take time to complete, owing to the fact that pip's dependecy resolvers may take time resolving potential package conflicts. To make installation faster, you can add pip's legacy resolver as --use-deprecated=legacy-resolver. Caution: using old resolvers may present unresolved package conflicts.

D3M dataset

  • Any dataset to be used should be in D3M dataset format (directory structure with TRAIN, TEST folders and underlying .json files).
  • Example available of a single dataset here
  • More datasets available here
  • Any non-D3M data can be converted to D3M dataset. (See section below on "Convert raw dataset to D3M dataset").

Run the AutonML pipeline

We can run the AutonML pipeline in two ways. It can be run as a standalone CLI command, accessed via the autonml_main command. This command takes five arguments, listed below:

  • Path to the data directory (must be in D3M format)
  • Output directory where results are to be stored. This directory will be dynamically created if it does not exist.
  • Timeout (measured in minutes)
  • Number of CPUs to be used (minimum: 4 cores, recommended: 8 cores)
  • Path to problemDoc.json (see example below)
INPUT_DIR=/home/<user>/d3m/datasets/185_baseball_MIN_METADATA
OUTPUT_DIR=/output
TIMEOUT=2
NUMCPUS=8
PROBLEMPATH=${INPUT_DIR}/TRAIN/problem_TRAIN/problemDoc.json

autonml_main ${INPUT_DIR} ${OUTPUT_DIR} ${TIMEOUT} ${NUMCPUS} ${PROBLEMPATH} 

The above script will do the following-

  1. Run search for best pipelines for the specified dataset using TRAIN data.
  2. JSON pipelines (with ranks) will be output in JSON format at /output/<search_dir>/pipelines_ranked/
  3. CSV prediction files of the pipelines trained on TRAIN data and predicted on TEST data will be available at /output/<search_dir>/predictions/
  4. Training data predictions (cross-validated mostly) are produced in the current directory as /output/<search_dir>/training_predictions/<pipeline_id>_train_predictions.csv.
  5. Python code equivalent of executing a JSON pipeline on a dataset produced at /output/<search_dir>/executables/

An example -

OUTPUT_DIR=output

python ${OUTPUT_DIR}/99211bc3-638a-455b-8d48-0dadc0bf1f10/executables/19908fd3-706a-48da-b13c-dc13da0ed3cc.code.py ${OUTPUT_DIR}/ ${OUTPUT_DIR}/99211bc3-638a-455b-8d48-0dadc0bf1f10/predictions/19908fd3-706a-48da-b13c-dc13da0ed3cc.predictions.csv

Convert raw dataset to D3M dataset

D3M dataset

  • Any dataset to be used should be in D3M dataset format (directory structure with TRAIN, TEST folders and underlying .json files).
  • Example available of a single dataset here
  • More datasets available here
  • Any non-D3M data can be converted to D3M dataset. (See section below on "Convert raw dataset to D3M dataset").

Convert raw dataset to D3M dataset

If not done already, run pip install autonml before our raw dataset converter.

create_d3m_dataset <train_data.csv> <test_data.csv> <label> <metric> -t classification <-t ...>

Detailed description of dataset type(s), task type(s) and metrics provided here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autonml-0.1.10.tar.gz (91.5 kB view details)

Uploaded Source

Built Distribution

autonml-0.1.10-py3-none-any.whl (87.8 kB view details)

Uploaded Python 3

File details

Details for the file autonml-0.1.10.tar.gz.

File metadata

  • Download URL: autonml-0.1.10.tar.gz
  • Upload date:
  • Size: 91.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.48.2 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for autonml-0.1.10.tar.gz
Algorithm Hash digest
SHA256 7e84b8b6f97d6f1311c53d8a8de42b5383484fc947e9d68859251308c8a50cc6
MD5 ecc7db4d64b77d4d1b502162cd6b3d0c
BLAKE2b-256 e1289ccbaa6a09884194d04455e9c390d6d6af39d24fc815377f733584e1226c

See more details on using hashes here.

File details

Details for the file autonml-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: autonml-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 87.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.48.2 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for autonml-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 79ba5607b894777d2d36fff3067d85e01b2c14f4ae52338889118ddd2b771527
MD5 68173d9e2d146d664fb1d346595c196b
BLAKE2b-256 4acc7e5d70d42072842094ebdb2bd624f14d26d54d525b3b313015ef5824ca52

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page