## Project description

A script to score CSV files via DataRobot’s prediction API.

## Version Compatibility

We aim to support as many versions of DataRobot as possible with each release of batch_scoring, but occasionally there are changes in the backend that create incompatibilities. This chart is kept up-to-date with the version compatibilities between this tool and versions of DataRobot. If you are unsure which version of DataRobot you are using, please contact DataRobot support for assistance.

 batch_scoring_version DataRobot Version <=1.10 2.7, 2.8, 2.9 >=1.11, <1.13 3.0, 3.1+ >=1.13 2.7, 2.8, 2.9, 3.0, 3.1+

Command batch_scoring_deployment_aware available only for new DataRobot versions.

 batch_scoring_deployment_aware DataRobot Version >=1.14 4.4+

## How to install

Install or upgrade to last version:

$pip install -U datarobot_batch_scoring How to install particular version: $ pip install datarobot_batch_scoring==x.y.z

## Alternative Installs

We publish two alternative install methods on our releases page. These are for situations where internet is restricted or Python is unavailable.

offlinebundle:

For performing installations in environments where Python2.7 or Python3+ is available, but there is no access to the internet. Does not require administrative privileges or pip. Works on Linux, OSX or Windows.

These files have “offlinebundle” in their name on the release page.

PyInstaller:

Using pyinstaller we build a single-file-executable that does not depend on Python. It only depends on libc and can be installed without administrative privileges. Right now we publish builds that work for most Linux distros made since Centos5. OSX and Windows are also supported.

These files have “executables” in their name on the release page.

## Features

• Concurrent requests (--n_concurrent)

• Pause/resume

• Gzip support

• Custom delimiters

• Parallel processing

## Running the batch_scoring, batch_scoring_sse scripts or batch_scoring_deployment_aware

You can execute the batch_scoring, batch_scoring_sse or batch_scoring_deployment_aware command from the command line with relevant parameters or you can pass parameters to a script from the .ini file. Place the .ini file in your home directory or the directory from which you are running the batch_scoring, batch_scoring_sse or batch_scoring_deployment_aware command. Use the syntax and arguments below to define the parameters. Note that if you run the script and also execute via the command line, the command line parameters take priority.

The following table describes the syntax conventions; the syntax for running the script follows the table. DataRobot supplies two scripts, each for a different application. Use:

• batch_scoring to score on dedicated prediction instances.

• batch_scoring_sse to score on standalone prediction instances. If you are unsure of your instance type, contact DataRobot Support.

• batch_scoring_deployment_aware to score on dedicated prediction instances using deployment_id instead of project_id and model_id.

 Convention Meaning [ ] Optional argument < > User supplied value { | } Required, mutually exclusive

Required arguments:

batch_scoring --host=<host> --user=<user> <project_id> <model_id> <dataset_filepath> --datarobot_key=<datarobot_key> {--password=<pwd> | --api_token=<api_token>}

batch_scoring_deployment_aware --host=<host> --user=<user> <deployment_id> <dataset_filepath> --datarobot_key=<datarobot_key> {--password=<pwd> | --api_token=<api_token>}

batch_scoring_sse --host=<host> <import_id> <dataset_filepath>

[--verbose] [--keep_cols=<keep_cols>] [--n_concurrent=<n_concurrent>]

[--out=<filepath>] [--api_version=<api_version>] [--pred_name=<string>] [--timeout=<timeout>] [—-create_api_token] [--n_retry=<n_retry>] [--delimiter=<delimiter>] [--resume] [--no-resume] [--skip_row_id] [--output_delimiter=<delimiter>]

Argument descriptions: The following table describes each of the arguments:

Example:

batch_scoring --host=https://mycorp.orm.datarobot.com/ --user="greg@mycorp.com" --out=pred.csv 5545eb20b4912911244d4835 5545eb71b4912911244d4847 /home/greg/Downloads/diabetes_test.csv
batch_scoring_deployment_aware --host=https://mycorp.orm.datarobot.com/ --user="greg@mycorp.com" --out=pred.csv 5545eb71b4912911244d4848 /home/greg/Downloads/diabetes_test.csv

## Using the configuration file

The batch_scoring command checks for the existence of a batch_scoring.ini file at the directory where you are running the script (working directory) and, if it is not found in the working directory, in \$HOME/batch_scoring.ini (your home directory). If this file exists, the command uses the same arguments as those described above. If the file does not exist, the command proceeds normally with the command line arguments. The command line arguments have higher priority than the file arguments (that is, you can override file arguments using the command line).

The format of a batch_scoring.ini file is as follows:

[batch_scoring]
host=file_host
project_id=file_project_id
model_id=file_model_id
password=file_password

## Usage Notes

• If the script detects that a previous script was interrupted in mid-execution, it will prompt whether to resume that execution.

• If no interrupted script was detected or if you indicate not to resume the previous execution, the script checks to see if the specified output file exists. If yes, the script prompts to confirm before overwriting this file.

• The logs from each batch_scoring and batch_scoring_sse run are stored in the current working directory. All users see a datarobot_batch_scoring_main.log log file. Windows users see two additional log files, datarobot_batch_scoring_batcher.log and datarobot_batch_scoring_writer.log.

• Batch scoring won’t work if there is only 1 feature in the scoring data. This issue is caused by limitations of standard python CSV parser. For resolving this issue, please add index column to the dataset - it’ll be ignored in scoring, but will help it in parsing.

## Supported Platforms

datarobot_batch_scoring is tested on Linux and Windows and OS X. Both Python 2.7.x and Python 3.x are supported.

## Proxy support

batch-scoring script handles standart HTTP_PROXY, HTTPS_PROXY, NO_PROXY environment variables:

export HTTP_PROXY=http://192.168.1.3:3128
export HTTPS_PROXY=http://192.168.1.3:3128
export NO_PROXY=noproxy.domain.com

