Skip to main content

test tools for FATE

Project description

A collection of useful tools to running FATE’s test.

tutorial

quick start

  1. (optional) create virtual env

    python -m venv venv
    source venv/bin/activate
    pip install -U pip
  2. install fate_test

    pip install fate_test
    fate_test --help
  3. edit default fate_test_config.yaml

    # edit priority config file with system default editor
    # filling some field according to comments
    fate_test config edit
  4. configure FATE-Pipeline and FATE-Flow Commandline server setting

# configure FATE-Pipeline server setting
pipeline init --port 9380 --ip 127.0.0.1
# configure FATE-Flow Commandline server setting
flow init --port 9380 --ip 127.0.0.1
  1. run some fate_test suite

    fate_test suite -i <path contains *testsuite.json>
  2. run some fate_test benchmark

    fate_test benchmark-quality -i <path contains *benchmark.json>
  3. useful logs or exception will be saved to logs dir with namespace shown in last step

develop install

It is more convenient to use the editable mode during development: replace step 2 with flowing steps

pip install -e ${FATE}/python/fate_client && pip install -e ${FATE}/python/fate_test

command types

  • suite: used for running testsuites, collection of FATE jobs

    fate_test suite -i <path contains *testsuite.json>
  • benchmark-quality used for comparing modeling quality between FATE and other machine learning systems

    fate_test benchmark-quality -i <path contains *benchmark.json>

configuration by examples

  1. no need ssh tunnel:

    • 9999, service: service_a

    • 10000, service: service_b

    and both service_a, service_b can be requested directly:

    work_mode: 1 # 0 for standalone, 1 for cluster
    data_base_dir: <path_to_data>
    parties:
      guest: [10000]
      host: [9999, 10000]
      arbiter: [9999]
    services:
      - flow_services:
        - {address: service_a, parties: [9999]}
        - {address: service_b, parties: [10000]}
  2. need ssh tunnel:

    • 9999, service: service_a

    • 10000, service: service_b

    service_a, can be requested directly while service_b don’t, but you can request service_b in other node, say B:

    work_mode: 0 # 0 for standalone, 1 for cluster
    data_base_dir: <path_to_data>
    parties:
      guest: [10000]
      host: [9999, 10000]
      arbiter: [9999]
    services:
      - flow_services:
        - {address: service_a, parties: [9999]}
      - flow_services:
        - {address: service_b, parties: [10000]}
        ssh_tunnel: # optional
        enable: true
        ssh_address: <ssh_ip_to_B>:<ssh_port_to_B>
        ssh_username: <ssh_username_to B>
        ssh_password: # optional
        ssh_priv_key: "~/.ssh/id_rsa"

Testsuite

Testsuite is used for running a collection of jobs in sequence. Data used for jobs could be uploaded before jobs are submitted, and are cleaned when jobs finished. This tool is useful for FATE’s release test.

command options

fate_test suite --help
  1. include:

    fate_test suite -i <path1 contains *testsuite.json>

    will run testsuites in path1

  2. exclude:

    fate_test suite -i <path1 contains *testsuite.json> -e <path2 to exclude> -e <path3 to exclude> ...

    will run testsuites in path1 but not in path2 and path3

  3. glob:

    fate_test suite -i <path1 contains *testsuite.json> -g "hetero*"

    will run testsuites in sub directory start with hetero of path1

  4. replace:

    fate_test suite -i <path1 contains *testsuite.json> -r '{"maxIter": 5}'

    will find all key-value pair with key “maxIter” in data conf or conf or dsl and replace the value with 5

  5. skip-data:

    fate_test suite -i <path1 contains *testsuite.json> --skip-data

    will run testsuites in path1 without uploading data specified in benchmark.json.

  6. yes:

    fate_test suite -i <path1 contains *testsuite.json> --yes

    will run testsuites in path1 directly, skipping double check

  7. skip-dsl-jobs:

    fate_test suite -i <path1 contains *testsuite.json> --skip-dsl-jobs

    will run testsuites in path1 but skip all tasks in testsuites. It’s would be useful when only pipeline tasks needed.

  8. skip-pipeline-jobs:

    fate_test suite -i <path1 contains *testsuite.json> --skip-pipeline-jobs

    will run testsuites in path1 but skip all pipeline tasks in testsuites. It’s would be useful when only dsl tasks needed.

Benchmark Quality

Benchmark-quality is used for comparing modeling quality between FATE and other machine learning systems. Benchmark produces a metrics comparison summary for each benchmark job group.

fate_test benchmark-quality -i examples/benchmark_quality/hetero_linear_regression
+-------+--------------------------------------------------------------+
|  Data |                             Name                             |
+-------+--------------------------------------------------------------+
| train | {'guest': 'motor_hetero_guest', 'host': 'motor_hetero_host'} |
|  test | {'guest': 'motor_hetero_guest', 'host': 'motor_hetero_host'} |
+-------+--------------------------------------------------------------+
+------------------------------------+--------------------+--------------------+-------------------------+---------------------+
|             Model Name             | explained_variance |      r2_score      | root_mean_squared_error |  mean_squared_error |
+------------------------------------+--------------------+--------------------+-------------------------+---------------------+
| local-linear_regression-regression | 0.9035168452250094 | 0.9035070863155368 |   0.31340413289880553   | 0.09822215051805216 |
| FATE-linear_regression-regression  | 0.903146386539082  | 0.9031411831961411 |    0.3139977881119483   | 0.09859461093919596 |
+------------------------------------+--------------------+--------------------+-------------------------+---------------------+
+-------------------------+-----------+
|          Metric         | All Match |
+-------------------------+-----------+
|    explained_variance   |    True   |
|         r2_score        |    True   |
| root_mean_squared_error |    True   |
|    mean_squared_error   |    True   |
+-------------------------+-----------+

command options

use the following command to show help message

fate_test benchmark-quality --help
  1. include:

    fate_test benchmark-quality -i <path1 contains *benchmark.json>

    will run benchmark testsuites in path1

  2. exclude:

    fate_test benchmark-quality -i <path1 contains *benchmark.json> -e <path2 to exclude> -e <path3 to exclude> ...

    will run benchmark testsuites in path1 but not in path2 and path3

  3. glob:

    fate_test benchmark-quality -i <path1 contains *benchmark.json> -g "hetero*"

    will run benchmark testsuites in sub directory start with hetero of path1

  4. tol:

    fate_test benchmark-quality -i <path1 contains *benchmark.json> -t 1e-3

    will run benchmark testsuites in path1 with absolute tolerance of difference between metrics set to 0.001. If absolute difference between metrics is smaller than tol, then metrics are considered almost equal. Check benchmark testsuite writing guide on setting alternative tolerance.

  5. skip-data:

    fate_test benchmark-quality -i <path1 contains *benchmark.json> --skip-data

    will run benchmark testsuites in path1 without uploading data specified in benchmark.json.

  6. yes:

    fate_test benchmark-quality -i <path1 contains *benchmark.json> --yes

    will run benchmark testsuites in path1 directly, skipping double check

benchmark testsuite

Configuration of jobs should be specified in a benchmark testsuite whose file name ends with “*benchmark.json”. For benchmark testsuite example, please refer here.

A benchmark testsuite includes the following elements:

  • data: list of local data to be uploaded before running FATE jobs

    • file: path to original data file to be uploaded, should be relative to testsuite or FATE installation path

    • head: whether file includes header

    • partition: number of partition for data storage

    • table_name: table name in storage

    • namespace: table namespace in storage

    • role: which role to upload the data, as specified in fate_test.config; naming format is: “{role_type}_{role_index}”, index starts at 0

    "data": [
        {
            "file": "examples/data/motor_hetero_host.csv",
            "head": 1,
            "partition": 8,
            "table_name": "motor_hetero_host",
            "namespace": "experiment",
            "role": "host_0"
        }
    ]
  • job group: each group includes arbitrary number of jobs with paths to corresponding script and configuration

    • job: name of job to be run, must be unique within each group list

      • script: path to testing script, should be relative to testsuite

      • conf: path to job configuration file for script, should be relative to testsuite

      "local": {
           "script": "./local-linr.py",
           "conf": "./linr_config.yaml"
      }
    • compare_setting: additional setting for quality metrics comparison, currently only takes relative_tol

      If metrics a and b satisfy abs(a-b) <= max(relative_tol * max(abs(a), abs(b)), absolute_tol) (from math module), they are considered almost equal. In the below example, metrics from “local” and “FATE” jobs are considered almost equal if their relative difference is smaller than 0.05 * max(abs(local_metric), abs(pipeline_metric).

    "linear_regression-regression": {
        "local": {
            "script": "./local-linr.py",
            "conf": "./linr_config.yaml"
        },
        "FATE": {
            "script": "./fate-linr.py",
            "conf": "./linr_config.yaml"
        },
        "compare_setting": {
            "relative_tol": 0.01
        }
    }

testing script

All job scripts need to have Main function as an entry point for executing jobs; scripts should return two dictionaries: first with data information key-value pairs: {data_type}: {data_name_dictionary}; the second contains {metric_name}: {metric_value} key-value pairs for metric comparison.

By default, the final data summary shows the output from the job named “FATE”; if no such job exists, data information returned by the first job is shown. For clear presentation, we suggest that user follow this general guideline for data set naming. In the case of multi-host task, consider numbering host as such:

{'guest': 'default_credit_homo_guest',
 'host_1': 'default_credit_homo_host_1',
 'host_2': 'default_credit_homo_host_2'}

Returned quality metrics of the same key are to be compared. Note that only real-value metrics can be compared.

  • FATE script: Main should have three inputs:

    • config: job configuration, JobConfig object loaded from “fate_test_config.yaml”

    • param: job parameter setting, dictionary loaded from “conf” file specified in benchmark testsuite

    • namespace: namespace suffix, user-given namespace or generated timestamp string when using namespace-mangling

  • non-FATE script: Main should have one or two inputs:

    • param: job parameter setting, dictionary loaded from “conf” file specified in benchmark testsuite

    • (optional) config: job configuration, JobConfig object loaded from “fate_test_config.yaml”

Note that Main in FATE & non-FATE scripts can also be set to take zero input argument.

data

Data sub-command is used for upload or delete dataset in suite’s.

command options

fate_test data --help
  1. include:

    fate_test data [upload|delete] -i <path1 contains *testsuite.json>

    will upload/delete dataset in testsuites in path1

  2. exclude:

    fate_test data [upload|delete] -i <path1 contains *testsuite.json> -e <path2 to exclude> -e <path3 to exclude> ...

    will upload/delete dataset in testsuites in path1 but not in path2 and path3

  3. glob:

    fate_test data [upload|delete] -i <path1 contains *testsuite.json> -g "hetero*"

    will upload/delete dataset in testsuites in sub directory start with hetero of path1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fate_test-0.2.2.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

fate_test-0.2.2-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file fate_test-0.2.2.tar.gz.

File metadata

  • Download URL: fate_test-0.2.2.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.6.5 Darwin/20.1.0

File hashes

Hashes for fate_test-0.2.2.tar.gz
Algorithm Hash digest
SHA256 8b884a91e138201524c5430747346f757db4ad3c771192e46f26949bad02452f
MD5 a17e62f24ce0625fd55dc06a0b7c36a4
BLAKE2b-256 f86d563130c009d17f170082cb1a8f279f42d4c0fb0e9efc08936843ad2e26a2

See more details on using hashes here.

File details

Details for the file fate_test-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: fate_test-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.6.5 Darwin/20.1.0

File hashes

Hashes for fate_test-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fa049b0e3e68e2a9821dc8e72048a24856d6b4cdc7f8e8a76f8fdaaa64b47e16
MD5 e0ec7db1c5e367a6bce38f7b54db805b
BLAKE2b-256 f54a07be2f1a82c502bfe69d93808fc289e283624ddadf27a1f0f52a0f76ed22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page