datarobot-batch-scoring

A script to support start/resume batch scoring via Datarobot API.

These details have been verified by PyPI

Maintainers

Axik datarobot dsakagi madmott stasdr

These details have not been verified by PyPI

Project links

Homepage

Project description

A script to score CSV files via DataRobot’s prediction API.

Installation

From PyPI

$ pip install -U datarobot_batch_scoring

From source

Create virtualenv:

$ mkvirtualenv batch_scoring

Install package in virtualenv:

$ pip install -e .

Now batch_scoring script should be available in your PATH.

From sdist

For Python 2.7:

$ python setup.py sdist

For Python 3:

$ python3.X setup.py sdist

Install via pip:

$ pip install dist/datarobot_batch_scoring-1.X.X.tar.gz

Run from source

If you want to run the script without installation you have to specify the batch_scoring module in the python interpreter:

$ python -m datarobot_scoring.batch_scoring --help

Install From S3

When trying to determine the path, first check the datarobot Amazon S3 bucket to determine the most current version. The path to S3 is: https://s3.amazonaws.com/datarobot_public/packages/

To get access to S3 you must be authorized and have login credentials to obtain this you must reach out to Tom or Ulises.

For Python 2.7:

$ pip install -U https://s3.amazonaws.com/datarobot_public/packages/datarobot_batch_scoring-X.X.X-py2.tar.gz

For Python 3:

$ pip install -U https://s3.amazonaws.com/datarobot_public/packages/datarobot_batch_scoring-X.X.X-py3.tar.gz

Deployment

Cut a release candidate

update setup.py & datarobot_scoring/batch_scoring.py

acceptance testing

tag release (& push tag)

Create sdist tarball

Make py3 tarball:

$ python3.3 setup.py sdist
$ mv dist/datarobot_scoring-X.X.X.tar.gz dist/datarobot_scoring-X.X.X-py3.tar.gz

Make py2 tarball:

$ python2.7 setup.py sdist
$ mv dist/datarobot_scoring-X.X.X.tar.gz dist/datarobot_scoring-X.X.X-py2.tar.gz

Distribute tarballs

TODO

Features

Concurrent requests (--n_concurrent)

Pause/resume

Gzip support

Custom delimiters

Usage

::

Usage: batch_scoring [–host=<host>] [–user=<user>]: [–password=<pwd>] [–api_token=<api_token>] [–datarobot_key=<datarobot_key>] [–verbose] [–n_samples=<n_samples>] [–n_retry=<n_retry>] [–n_concurrent=<n_concurrent>] [–out=<out>] [–api_version=<api_version>] [–create_api_token] [–keep_cols=<keep_cols>] [–delimiter=<delimiter>] {project_id} {model_id} {dataset} [–resume|–cancel]

Batch score dataset by submitting prediction requests against host using model model_id. It will send batches of size n_samples.

Set n_samples such that the round-trip is roughly 10sec (see verbose output).

Set n_concurrent to match the number of cores in the prediction API endpoint.

The dataset has to be a single CSV file that can be gzipped (extension ‘.gz’).

The output out will be a single CSV files but remember that records might be unordered.

Arguments:

--host=<host>    The host to test [default: https://beta.datarobot.com/api].
--api_version=<api_version>    The API version [default: v1]
--datarobot_key=<datarobot_key>   An additional datarobot_key for dedicated prediction instances.
--user=<user>  The username to acquire the api-token; if none prompt.
--password=<pwd>  The password to acquire the api-token; if none prompt.
--n_samples=<n_samples>  The number of samples per batch [default: 1000].
--n_retry=<n_retry>  The number of retries if a request failed; -1 means infinite [default: 3].
--n_concurrent=<n_concurrent>  The number of concurrent requests to submit [default: 4].
--api_token=<api_token>  The api token for the requests; if none use <pwd> to get token.
--out=<out>  The file to which the results should be written [default: out.csv].
--keep_cols=<keep_cols>  A comma separated list of column names to append to the predictions.
--delimiter=<delimiter>  Delimiter to use. If empty, will try to automatically determine this [default: ,].
{project_id}  the project ID number
{model_id} the model ID number
{dataset} the filename of the records you want to fetch predictions for

Options:

-h --help
-v --verbose  Verbose output
-c --create_api_token  If set we will request a new api token.
-r --resume   Resume a checkpointed run.
-c --cancel   Cancel a checkpointed run.

Example:

batch_scoring --host=https://beta.datarobot.com/api --user="greg@datarobot.com" --out=pred.csv 5545eb20b4912911244d4835 5545eb71b4912911244d4847 ~/Downloads/diabetes_test.csv

Project details

These details have been verified by PyPI

Maintainers

Axik datarobot dsakagi madmott stasdr

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.16.5

Dec 11, 2019

1.16.4

Aug 29, 2019

1.16.3

Aug 4, 2019

1.16.2

Jul 25, 2019

1.16.1

May 27, 2019

1.16.0

Mar 29, 2019

1.15.3

Jan 9, 2019

1.15.2

Dec 18, 2018

1.15.1

Dec 4, 2018

1.15.0

Nov 23, 2018

1.14.2

Nov 21, 2018

1.14.1

Aug 30, 2018

1.14.0

Aug 17, 2018

1.13.3

Jun 20, 2018

1.13.2

Feb 26, 2018

1.13.1

Feb 26, 2018

1.13.0

Nov 11, 2017

1.12.1

Aug 15, 2017

1.11.0

May 31, 2017

1.10.2

May 9, 2017

1.10.1

Apr 27, 2017

1.10.0

Jan 27, 2017

1.9.1

Jan 6, 2017

1.9.0

Dec 2, 2016

1.8.8

Nov 17, 2016

1.8.7

Nov 2, 2016

1.8.6

Sep 6, 2016

1.8.5

Jul 29, 2016

1.8.4

Jul 12, 2016

1.8.3

Jul 8, 2016

1.8.2

Jun 21, 2016

1.8.1

Jun 21, 2016

1.8.0

Jun 13, 2016

1.7.0a0 pre-release

Jun 13, 2016

1.6.0a4 pre-release

May 4, 2016

1.6.0a3 pre-release

Apr 29, 2016

1.6.0a2 pre-release

Apr 29, 2016

1.6.0a0 pre-release

Apr 28, 2016

This version

1.5.1

Jan 28, 2016

1.5.0

Jan 28, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datarobot_batch_scoring-1.5.1.tar.gz (15.2 kB view hashes)

Uploaded Jan 28, 2016 Source

Built Distributions

datarobot_batch_scoring-1.5.1-py3-none-any.whl (16.1 kB view hashes)

Uploaded Jan 28, 2016 Python 3

datarobot_batch_scoring-1.5.1-py2-none-any.whl (16.0 kB view hashes)

Uploaded Jan 28, 2016 Python 2

Hashes for datarobot_batch_scoring-1.5.1.tar.gz

Hashes for datarobot_batch_scoring-1.5.1.tar.gz
Algorithm	Hash digest
SHA256	`b7d3cceed59b62f7ef9e04e54f24c85ba78571ce7170471cfdf935852a4da9dc`
MD5	`b8156c39dfd95a2fbfbb8b5f8ec27816`
BLAKE2b-256	`479690f649af83a54bc75e243ce73b343fc458cc536d4b381e9b53e4e964841c`

Hashes for datarobot_batch_scoring-1.5.1-py3-none-any.whl

Hashes for datarobot_batch_scoring-1.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51a530d7f33d6ab5ca442f362b37450d777dfdc50a886579b16b7810759f485b`
MD5	`eb4c9487823dfa27d75bff77c7a7818a`
BLAKE2b-256	`41897a22255af316a6e201b6d6345042ca3055729c76e54217dcdab3720f59c1`

Hashes for datarobot_batch_scoring-1.5.1-py2-none-any.whl

Hashes for datarobot_batch_scoring-1.5.1-py2-none-any.whl
Algorithm	Hash digest
SHA256	`c08a0ac9260a430182ec90fba3c3e88700bccfb673200ae78a92552cb4bd8aac`
MD5	`349a3a5f2c3e4c4f0eeb5468b616c42a`
BLAKE2b-256	`a67175aafb937d4e065cd23425e42112559940d20ff2aa0b3deca4e0134b44e6`