A script to support start/resume batch scoring via Datarobot API.
Project description
A script to score CSV files via DataRobot’s prediction API.
Installation
From PyPI
$ pip install -U datarobot_batch_scoring
From source
Create virtualenv:
$ mkvirtualenv batch_scoring
Install package in virtualenv:
$ pip install -e .
Now batch_scoring script should be available in your PATH.
From sdist
For Python 2.7:
$ python setup.py sdist
For Python 3:
$ python3.X setup.py sdist
Install via pip:
$ pip install dist/datarobot_batch_scoring-1.X.X.tar.gz
Run from source
If you want to run the script without installation you have to specify the batch_scoring module in the python interpreter:
$ python -m datarobot_scoring.batch_scoring --help
Install From S3
When trying to determine the path, first check the datarobot Amazon S3 bucket to determine the most current version. The path to S3 is: https://s3.amazonaws.com/datarobot_public/packages/
To get access to S3 you must be authorized and have login credentials to obtain this you must reach out to Tom or Ulises.
For Python 2.7:
$ pip install -U https://s3.amazonaws.com/datarobot_public/packages/datarobot_batch_scoring-X.X.X-py2.tar.gz
For Python 3:
$ pip install -U https://s3.amazonaws.com/datarobot_public/packages/datarobot_batch_scoring-X.X.X-py3.tar.gz
Deployment
Cut a release candidate
update setup.py & datarobot_scoring/batch_scoring.py
acceptance testing
tag release (& push tag)
Create sdist tarball
Make py3 tarball:
$ python3.3 setup.py sdist $ mv dist/datarobot_scoring-X.X.X.tar.gz dist/datarobot_scoring-X.X.X-py3.tar.gzMake py2 tarball:
$ python2.7 setup.py sdist $ mv dist/datarobot_scoring-X.X.X.tar.gz dist/datarobot_scoring-X.X.X-py2.tar.gz
Distribute tarballs
TODO
Features
Concurrent requests (--n_concurrent)
Pause/resume
Gzip support
Custom delimiters
Usage
- ::
- Usage: batch_scoring [–host=<host>] [–user=<user>]
[–password=<pwd>] [–api_token=<api_token>] [–datarobot_key=<datarobot_key>] [–verbose] [–n_samples=<n_samples>] [–n_retry=<n_retry>] [–n_concurrent=<n_concurrent>] [–out=<out>] [–api_version=<api_version>] [–create_api_token] [–keep_cols=<keep_cols>] [–delimiter=<delimiter>] {project_id} {model_id} {dataset} [–resume|–cancel]
Batch score dataset by submitting prediction requests against host using model model_id. It will send batches of size n_samples.
Set n_samples such that the round-trip is roughly 10sec (see verbose output).
Set n_concurrent to match the number of cores in the prediction API endpoint.
The dataset has to be a single CSV file that can be gzipped (extension ‘.gz’).
The output out will be a single CSV files but remember that records might be unordered.
Arguments:
--host=<host> The host to test [default: https://beta.datarobot.com/api]. --api_version=<api_version> The API version [default: v1] --datarobot_key=<datarobot_key> An additional datarobot_key for dedicated prediction instances. --user=<user> The username to acquire the api-token; if none prompt. --password=<pwd> The password to acquire the api-token; if none prompt. --n_samples=<n_samples> The number of samples per batch [default: 1000]. --n_retry=<n_retry> The number of retries if a request failed; -1 means infinite [default: 3]. --n_concurrent=<n_concurrent> The number of concurrent requests to submit [default: 4]. --api_token=<api_token> The api token for the requests; if none use <pwd> to get token. --out=<out> The file to which the results should be written [default: out.csv]. --keep_cols=<keep_cols> A comma separated list of column names to append to the predictions. --delimiter=<delimiter> Delimiter to use. If empty, will try to automatically determine this [default: ,]. {project_id} the project ID number {model_id} the model ID number {dataset} the filename of the records you want to fetch predictions for
Options:
-h --help -v --verbose Verbose output -c --create_api_token If set we will request a new api token. -r --resume Resume a checkpointed run. -c --cancel Cancel a checkpointed run.
Example:
batch_scoring --host=https://beta.datarobot.com/api --user="greg@datarobot.com" --out=pred.csv 5545eb20b4912911244d4835 5545eb71b4912911244d4847 ~/Downloads/diabetes_test.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for datarobot_batch_scoring-1.5.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7d3cceed59b62f7ef9e04e54f24c85ba78571ce7170471cfdf935852a4da9dc |
|
MD5 | b8156c39dfd95a2fbfbb8b5f8ec27816 |
|
BLAKE2b-256 | 479690f649af83a54bc75e243ce73b343fc458cc536d4b381e9b53e4e964841c |
Hashes for datarobot_batch_scoring-1.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51a530d7f33d6ab5ca442f362b37450d777dfdc50a886579b16b7810759f485b |
|
MD5 | eb4c9487823dfa27d75bff77c7a7818a |
|
BLAKE2b-256 | 41897a22255af316a6e201b6d6345042ca3055729c76e54217dcdab3720f59c1 |
Hashes for datarobot_batch_scoring-1.5.1-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c08a0ac9260a430182ec90fba3c3e88700bccfb673200ae78a92552cb4bd8aac |
|
MD5 | 349a3a5f2c3e4c4f0eeb5468b616c42a |
|
BLAKE2b-256 | a67175aafb937d4e065cd23425e42112559940d20ff2aa0b3deca4e0134b44e6 |