Skip to main content

AutonML : CMU's AutoML System

Project description

CMU TA2 (Built using DARPA D3M ecosystem)

Auton ML is an automated machine learning system developed by CMU Auton Lab to power data scientists with efficient model discovery and advanced data analytics. Auton ML also powers the D3M Subject Matter Expert (SME) User Interfaces such as Two Ravens http://2ra.vn/.

Taking your machine learning capacity to the nth power.

D3M dataset

  • Any dataset to be used should be in D3M dataset format (directory structure with TRAIN, TEST folders and underlying .json files).
  • Example available of a single dataset here
  • More datasets available here
  • Any non-D3M data can be converted to D3M dataset. (See section below on "Convert raw dataset to D3M dataset").

Convert raw dataset to D3M dataset

pip install d3m
python create_d3m_dataset.py <train_data.csv> <test_data.csv> <label> <metric> -t classification <-t ...>

Detailed description of dataset type(s), task type(s) and metrics provided here.

Run in search mode

We can run the AutonML pipeline in two ways. It be run as a standalone CLI command, accessed via the autonml_main command. This command takes four arguments, listed below:

  • Path to the data directory (must be in D3M format)
  • Output directory where results are to be stored. This directory will be dynamically created if it does not exist.
  • Timeout (measured in minutes)
  • Number of CPUs to be used
  • Path to problemDoc.json (see example below)
INPUT_DIR=/home/<user>/d3m/datasets/185_baseball_MIN_METADATA
OUTPUT_DIR=/output
TIMEOUT=2
NUMCPUS=8
PROBLEMPATH=${INPUT_DIR}/TRAIN/problem_TRAIN/problemDoc.json

autonml_main ${INPUT_DIR} ${OUTPUT_DIR} ${TIMEOUT} ${NUMCPUS} ${PROBLEMPATH} 

The above script will do the following-

  1. Run search for best pipelines for the specified dataset using TRAIN data.
  2. JSON pipelines (with ranks) will be output in JSON format at /output/<search_dir>/pipelines_ranked/
  3. CSV prediction files of the pipelines trained on TRAIN data and predicted on TEST data will be available at /output/<search_dir>/predictions/
  4. Training data predictions (cross-validated mostly) are produced in the current directory as /output/<search_dir>/training_predictions/<pipeline_id>_train_predictions.csv.
  5. Python code equivalent of executing a JSON pipeline on a dataset produced at /output/<search_dir>/executables/

An example -

python ./output/6b92f2f7-74d2-4e86-958d-4e62bbd89c51/executables/131542c6-ea71-4403-9c2d-d899e990e7bd.json.code.py 185_baseball predictions.csv 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autonml-0.1.2.tar.gz (87.5 kB view details)

Uploaded Source

Built Distribution

autonml-0.1.2-py3-none-any.whl (84.6 kB view details)

Uploaded Python 3

File details

Details for the file autonml-0.1.2.tar.gz.

File metadata

  • Download URL: autonml-0.1.2.tar.gz
  • Upload date:
  • Size: 87.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.48.2 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for autonml-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e2f29ebb655504bf0890200849d9eaeddbef29ce5ec6d5df790493004a9fbdb0
MD5 e61e62e5251e7dfb62695bb9b3b38f37
BLAKE2b-256 9b34a700d03a2a211e203a09fec183f0e72ae0f2b619b0a1e9582a9666f51e25

See more details on using hashes here.

File details

Details for the file autonml-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: autonml-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 84.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.48.2 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for autonml-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e2987de5bbb760123e98ff26a814cb9f1ad4783ddd872b722237d7be63a1ce12
MD5 d3af97f67ddd86cbe70468c120b5ba22
BLAKE2b-256 738b17254d769bbceda60164ea13c98e11aefb2128d9ff69adb0e60b03a15881

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page