Skip to main content

Python client library for Staroid cloud platform

Project description


Open data studio python client

Open data studio is a managed computing service on Staroid. Run your machine learning and large scale data processing workloads without managing clusters and servers.

This repository provides a python client library. Currently, the following computing frameworks are supported in the library.

  • Apache Spark
  • Dask (coming soon)
  • Ray (coming soon)

Let's get started!

Quick start

Open In Colab

Install

pip install ods

Python 3.6, 3.7, 3.8 are supported.

Initialize

  1. Login staroid.com and get an access token. And set the STAROID_ACCESS_TOKEN environment variable. See here for more detail.
  2. Login staroid.com and create a SKE (Star Kubernetes engine) cluster.
import ods
# 'ske' is the name of kubernetes cluster created from staroid.com.
# Alternatively, you can set the 'STAROID_SKE' environment variable.
ods.init(ske="kube-cluster-1")

Spark

Quick start video

IMAGE ALT TEXT HERE

Create spark session

Create a spark session with the default configuration. You don't need to install/configure spark manually.

import ods
spark = ods.spark("spark-1").session() # 'spark-1' is name of spark-serverless instance to create.
df = spark.createDataFrame(....)

Configure initial number of worker nodes

import ods
spark = ods.spark("spark-1", worker_num=3).session()
df = spark.createDataFrame(....)

detal=True to automatically download & configure delta lake

import ods
spark = ods.spark("spark-delta", delta=True).session()
spark.read.format("delta").load(....)

pass spark_conf dictionary for additonal configuration

import ods
spark = ods.spark(spark_conf = {
    "spark.hadoop.fs.s3a.access.key": "...",
    "spark.hadoop.fs.s3a.secret.key" : "..."
}).session()

configure spark version

import ods
spark = ods.spark(spark_version = "3.0.1").session()

Currently, spark 3.0.1, 3.0.0 are supported.

Check tests/test_spark.py for complete working example.

Dask

Coming soon 🚛

import ods
cluster = ods.dask("dask-1", worker_num=10)

from dask.distributed import Client
client = Client(cluster)

Ray

Coming soon 🚛

import ods
ods.ray(cluster_name="")

Get involved

Open data studio is an open source project. Please give us feedback and feel free to get involved!

Commercial support

Staroid actively contributes to Open data studio and provides commercial support. Please contact.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ods-0.0.7.tar.gz (8.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page