Skip to main content

Python client library for Staroid cloud platform

Project description


Open data studio python client

Open data studio is a managed computing service on Staroid. Run your machine learning and large scale data processing workloads without managing clusters and servers.

This repository provides a python client library. Currently, the following computing frameworks are supported in the library.

  • Apache Spark
  • Dask (coming soon)
  • Ray (coming soon)

Let's get started!

Quick start

Open In Colab

Install

pip install ods

Python 3.6, 3.7, 3.8 are supported.

Initialize

  1. Login staroid.com and get an access token. And set the STAROID_ACCESS_TOKEN environment variable. See here for more detail.
  2. Login staroid.com and create a SKE (Star Kubernetes engine) cluster.
import ods
# 'ske' is the name of kubernetes cluster created from staroid.com.
# Alternatively, you can set the 'STAROID_SKE' environment variable.
ods.init(ske="kube-cluster-1")

Spark

Quick start video

IMAGE ALT TEXT HERE

Create spark session

Create a spark session with the default configuration. You don't need to install/configure spark manually.

import ods
spark = ods.spark("spark-1").session() # 'spark-1' is name of spark-serverless instance to create.
df = spark.createDataFrame(....)

Configure initial number of worker nodes

import ods
spark = ods.spark("spark-1", worker_num=3).session()
df = spark.createDataFrame(....)

detal=True to automatically download & configure delta lake

import ods
spark = ods.spark("spark-delta", delta=True).session()
spark.read.format("delta").load(....)

pass spark_conf dictionary for additonal configuration

import ods
spark = ods.spark(spark_conf = {
    "spark.hadoop.fs.s3a.access.key": "...",
    "spark.hadoop.fs.s3a.secret.key" : "..."
}).session()

configure spark version

import ods
spark = ods.spark(spark_version = "3.0.1").session()

Currently, spark 3.0.1, 3.0.0 are supported.

Check tests/test_spark.py for complete working example.

Dask

Coming soon 🚛

import ods
cluster = ods.dask("dask-1", worker_num=10)

from dask.distributed import Client
client = Client(cluster)

Ray

Coming soon 🚛

import ods
ods.ray(cluster_name="")

Get involved

Open data studio is an open source project. Please give us feedback and feel free to get involved!

Commercial support

Staroid actively contributes to Open data studio and provides commercial support. Please contact.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ods-0.0.7.tar.gz (8.5 kB view details)

Uploaded Source

File details

Details for the file ods-0.0.7.tar.gz.

File metadata

  • Download URL: ods-0.0.7.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for ods-0.0.7.tar.gz
Algorithm Hash digest
SHA256 611dd5b864872abb59072e12c2b4498d89b2b35d6e8897c6eb3ac5a2f80c3be6
MD5 e006b6f9ba83e7a3f95e2e055f033ce9
BLAKE2b-256 b6b60f47bb8153fa88e4447854084bf0c56c19414411fc04d3b7d2e019fb34b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page