Skip to main content

CLI to control Pandio's machine learning service.

Project description

Pandio Logo

PandioCLI - Pandio.com Machine Learning CLI Tool

This repository contains the PandioCLI tool to develop and deploy machine learning for streaming data.

Quick Links

Pandio.com/PandioML - Pandio.com - Getting Started - Quick Start - PyPi PandioML - PyPi PandioCLI

Installation

pip install pandiocli

Requirements

Python 3.5 - 3.8 PIP > 20.0.0

Commands

pandiocli function generate --project_name example

Generates a project template in the current working directory at ./example

  1. ./example/function.py

    This is the file where all of your logic should be placed.

  2. ./example/requirements.txt

    This file should contain all the necessary Python packages to power function.py. The contents of this will automatically be installed for you when deploying to Pandio's platform. When running locally, make sure to install as you normally would pip install -r requirements.txt

  3. ./example/config.py

    This contains non-sensitive configuration parameters for the project. Sensitive configuration parameters are set via the PandioCLI.

    Acceptable values are:

'FUNCTION_NAME': 'exampleFunction123',
'CONNECTION_STRING': 'pulsar://localhost:6651',
'ADMIN_API': 'http://localhost:8080',
'TENANT': 'public',
'NAMESPACE': 'default',
'INPUT_TOPICS': ['non-persistent://public/default/in'],
'OUTPUT_TOPICS': ['non-persistent://public/default/out'],
'LOG_TOPIC': 'non-persistent://public/default/log',
'ARTIFACT_STORAGE': "./artifacts"

pandiocli function upload --project_folder path_to_folder

Package up your function project and upload it to Pandio's platform.

pandiocli dataset generate --project_name example

Generates a project template in the current working directory at ./example

  1. ./example/dataset.py

    This is the file where all of your logic should be placed.

    Three things need to be defined to complete the dataset:

    • __init__

      Establish a connection or load your data. Returns an iterable.

    • next

      Returns a single record from the dataset.

    • schema

      Defines the schema used for the dataset.

    For more information on schemas, see the Schema Registry.

  2. ./example/wrapper.py

    This is a wrapper class for the dataset to allow it to work on the Pandio platform.

    Note: You should not need to ever modify this file.

  3. ./example/requirements.txt

    This file should contain all the necessary Python packages to power dataset.py. The contents of this will automatically be installed for you when deploying to Pandio's platform. When running locally, make sure to install as you normally would pip install -r requirements.txt

  4. ./example/config.py

    This contains non-sensitive configuration parameters for the project. Sensitive configuration parameters are set via the PandioCLI.

    Acceptable values are:

'FUNCTION_NAME': 'exampleFunction123',
'CONNECTION_STRING': 'pulsar://localhost:6651',
'ADMIN_API': 'http://localhost:8080',
'TENANT': 'public',
'NAMESPACE': 'default',
'INPUT_TOPICS': ['non-persistent://public/default/in'],
'OUTPUT_TOPICS': ['non-persistent://public/default/out'],
'LOG_TOPIC': 'non-persistent://public/default/log'

Additional parameter of --type can be specified to generate a dataset with a template.

Currently supported templates are:

  • mysql
  • trino
  • csv

pandiocli dataset upload --project_folder path_to_folder

Package up your dataset project and upload it to Pandio's platform.

pandiocli config show

This will output the current configuration for the PandioCLI

pandiocli config file

This will output the current configuration file location for the PandioCLI

pandiocli config reset

This will delete all settings for the PandioCLI

pandiocli config set --key PANDIO_TOKEN --value ABC123

This command allows you to manually set the configuration parameters for PandioCLI

These values are first set when you use the register command.

  • PANDIO_CLUSTER
  • PANDIO_TENANT
  • PANDIO_NAMESPACE
  • PANDIO_CLUSTER_TOKEN
  • PANDIO_EMAIL
  • PANDIO_DATA_TOKEN

Note: These values can be found from inside of your Pandio.com Dashboard

pandiocli test --project_folder folder_name --dataset_name FormSubmissionGenerator --loops 1000

This is a helper method to running the folder_name/runner.py file manually with Python. It includes performance metrics which is helpful to debug excessive resource usage such as memory leaks.

project_folder is the relative path to the project folder from where the command is being executed.

dataset_name is the name of the pandioml.data datasets and generators available inside of PandioML or the relative path to the folder of the dataset generated by the pandiocli dataset generate command.

loops is the number of events to process. Most streams of data are infinite, so this allows iterative testing with limited data.

pipeline_name is the number of events to process. Most streams of data are infinite, so this allows iterative testing with limited data.

pandiocli register your@email.com

This command registers a Pandio.com account for you. An email with a link to verify your registration will be sent.

Once the link is clicked, the local PandioCLI will be configured successfully with your new Pandio account.

If you already have a Pandio.com account, you'll need to use the pandiocli config command to manually set the configuration with values inside of the Pandio.com Dashboard.

Contributing

All contributions are welcome.

The best ways to get involved are as follows:

  1. Issues

    This is a great place to report any problems found with PandioCLI. Bugs, inconsistencies, missing documentation, or anything that acted as an obstacle to using PandioCLI.

  2. Discussions

    This is a great place for anything related to PandioCLI. Propose features, ask questions, highlight use cases, or anything else you can imagine.

If you would like to submit a pull request to this library, please read the contributor guidelines.

License

PandioCLI is licensed under the SSPL license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandiocli-1.0.17.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

pandiocli-1.0.17-py3-none-any.whl (151.4 kB view details)

Uploaded Python 3

File details

Details for the file pandiocli-1.0.17.tar.gz.

File metadata

  • Download URL: pandiocli-1.0.17.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for pandiocli-1.0.17.tar.gz
Algorithm Hash digest
SHA256 8fa80f3213de2501d90ad1187207b168ba0034fee233e290dbb42778d604a439
MD5 5f0c3a251efb3e6b64d7956c78f01f22
BLAKE2b-256 380a3915521b86b9516ea43e326a10da8bfb19816b0d0d630f6f80ac5374cd6c

See more details on using hashes here.

File details

Details for the file pandiocli-1.0.17-py3-none-any.whl.

File metadata

  • Download URL: pandiocli-1.0.17-py3-none-any.whl
  • Upload date:
  • Size: 151.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for pandiocli-1.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 7dc7f15bf4bffa224e52f0dfa7b105ead488b5439799d8d67f1fc606944663b2
MD5 3eff37d8fc884cf7183de62f1c1dc967
BLAKE2b-256 e8bbb2defabeb3881a757b5d012d0ffa3abf2426536ba62f3d0194b671026e86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page