Skip to main content

This package contains all of the classes and functions you need to interact with Splice Machine's scale out, Hadoop on SQL RDBMS from Python. It also contains several machine learning utilities for use with Apache Spark, a managed MLFlow client and a Managed Feature Store client.

Project description

Docs

Splice Machine Python Package

This package contains all of the classes and functions you need to interact with Splice Machine's scale out, Hadoop on SQL RDBMS from Python. It also contains several machine learning utilities for use with Apache Spark.

Installation Instructions: with Pip

(sudo) pip install splicemachine

To include notebook utilities

(sudo) pip install splicemachine[notebook]

To include statistics utilities

(sudo) pip install splicemachine[stats]

To include all extras (recommended)

(sudo) pip install splicemachine[all]

NOTE: If you use zsh and plan to install extras, you must escape the brackets (pip install splicemachine\[all\]

Modules

This package contains 4 main external modules. First, splicemachine.spark.context, which houses our Python wrapped Native Spark Datasource, as well as our External Native Spark Datasource, for use outside of the Kubernetes Cluster. Second, splicemachine.mlflow_support which houses our Python interface to MLManager. Lastly, splicemachine.stats which houses functions/classes which simplify machine learning (by providing functions like Decision Tree Visualizers, Model Evaluators etc.) and splicemachine.notebook which provides Jupyter Notebook specific functionality like an embedded MLFlow UI and Spark Jobs UI.

  1. splicemachine.spark.context: Native Spark Datasource for interacting with Splice Machine from Spark

    1.1) splicemachine.spark.context.ExtPySpliceContext: External Native Spark Datasource for interacting with Splice Machine from Spark. Usage is mostly identical to above after instantiation (with a few extra functions available). To instantiate, you must provide the kafkaServers parameter pointing to the Kafka URL of the splice cluster you want to connect to. In Standalone, that url will be the default parameter of the class (localhost:9092)

  2. splicemachine.mlflow_support: MLFlow wrapped MLManager interface from Python. The majority of documentation is identical to MLflow. Additional functions and functionality are available in the docs

  3. splicemachine.features: The Python SDK entrypoint to the Splice Machine Feature Store

  4. Extensions

    4.1) splicemachine.stats: houses utilities for machine learning

    4.2) splicemachine.notebooks: houses utilities for use in Jupyter Notebooks running in the Kubernetes cloud environment

Docs

The docs are managed py readthedocs and Sphinx. See latest docs here

Building the docs

cd docs
make html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splicemachine-2.8.0.tar.gz (76.8 kB view details)

Uploaded Source

File details

Details for the file splicemachine-2.8.0.tar.gz.

File metadata

  • Download URL: splicemachine-2.8.0.tar.gz
  • Upload date:
  • Size: 76.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for splicemachine-2.8.0.tar.gz
Algorithm Hash digest
SHA256 342f394ff990113570c6c1651776cb930a66f5effc0f488ee4658811ac493802
MD5 756065e00e31221d00f293e773569357
BLAKE2b-256 7fe518bff752d9d606d1d050130481f75e342df536eeb043817818087e979b16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page