Skip to main content

Python client library for Staroid cloud platform

Project description


Open data studio python client

Open data studio is a managed computing service on Staroid. Run your machine learning and large scale data processing workloads without managing clusters and servers.

This repository provides a python client library. Currently, the following computing frameworks are supported in the library.

  • Apache Spark
  • Dask (coming soon)
  • Ray (coming soon)

Let's get started!

Quick start

Open In Colab

Install

pip install ods

Python 3.6, 3.7, 3.8 are supported.

Initialize

  1. Login staroid.com and get an access token. And set the STAROID_ACCESS_TOKEN environment variable. See here for more detail.
  2. Login staroid.com and create a SKE (Star Kubernetes engine) cluster.
import ods
# 'ske' is the name of kubernetes cluster created from staroid.com.
# Alternatively, you can set the 'STAROID_SKE' environment variable.
ods.init(ske="kube-cluster-1")

Spark

Quick start video

IMAGE ALT TEXT HERE

Create spark session

Create a spark session with the default configuration. You don't need to install/configure spark manually.

import ods
spark = ods.spark("spark-1").session() # 'spark-1' is name of spark-serverless instance to create.
df = spark.createDataFrame(....)

Configure initial number of worker nodes

import ods
spark = ods.spark("spark-1", worker_num=3).session()
df = spark.createDataFrame(....)

detal=True to automatically download & configure delta lake

import ods
spark = ods.spark("spark-delta", delta=True).session()
spark.read.format("delta").load(....)

pass spark_conf dictionary for additonal configuration

import ods
spark = ods.spark(spark_conf = {
    "spark.hadoop.fs.s3a.access.key": "...",
    "spark.hadoop.fs.s3a.secret.key" : "..."
}).session()

configure spark version

import ods
spark = ods.spark(spark_version = "3.0.1").session()

Currently, spark 3.0.1, 3.0.0 are supported.

Check tests/test_spark.py for complete working example.

Dask

Coming soon 🚛

import ods
cluster = ods.dask("dask-1", worker_num=10)

from dask.distributed import Client
client = Client(cluster)

Ray

Coming soon 🚛

import ods
ods.ray(cluster_name="")

Get involved

Open data studio is an open source project. Please give us feedback and feel free to get involved!

Commercial support

Staroid actively contributes to Open data studio and provides commercial support. Please contact.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ods-0.0.5.tar.gz (8.4 kB view details)

Uploaded Source

File details

Details for the file ods-0.0.5.tar.gz.

File metadata

  • Download URL: ods-0.0.5.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for ods-0.0.5.tar.gz
Algorithm Hash digest
SHA256 61ff82ebec09ca0b493ec87eebebe9039b8cbb8c2b10d0012bbcf9eaf824bae2
MD5 09d9ac1281b06c4c7807aeb1a03ad1f4
BLAKE2b-256 6c75b9900b7f517b39c38a7ec8ff6153662bb7ae568254cc432f27671daf33be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page