Python client library for Staroid cloud platform
Project description
Open data studio python client
Open data studio is a managed computing service on Staroid. Run your machine learning and large scale data processing workloads without managing clusters and servers.
This repository provides a python client library. Currently, the following computing frameworks are supported in the library.
- Apache Spark
- Dask (coming soon)
- Ray (coming soon)
Let's get started!
Quick start
Install
pip install ods
Python 3.6
, 3.7
, 3.8
are supported.
Initialize
- Login staroid.com and get an access token. And set the
STAROID_ACCESS_TOKEN
environment variable. See here for more detail. - Login staroid.com and create a SKE (Star Kubernetes engine) cluster.
import ods
# 'ske' is the name of kubernetes cluster created from staroid.com.
# Alternatively, you can set the 'STAROID_SKE' environment variable.
ods.init(ske="kube-cluster-1")
Spark
Quick start video
Create spark session
Create a spark session with the default configuration. You don't need to install/configure spark manually.
import ods
spark = ods.spark("spark-1").session() # 'spark-1' is name of spark-serverless instance to create.
df = spark.createDataFrame(....)
Configure initial number of worker nodes
import ods
spark = ods.spark("spark-1", worker_num=3).session()
df = spark.createDataFrame(....)
detal=True
to automatically download & configure delta lake
import ods
spark = ods.spark("spark-delta", delta=True).session()
spark.read.format("delta").load(....)
pass spark_conf
dictionary for additonal configuration
import ods
spark = ods.spark(spark_conf = {
"spark.hadoop.fs.s3a.access.key": "...",
"spark.hadoop.fs.s3a.secret.key" : "..."
}).session()
configure spark version
import ods
spark = ods.spark(spark_version = "3.0.1").session()
Currently, spark 3.0.1
, 3.0.0
are supported.
Check tests/test_spark.py for complete working example.
Dask
Coming soon 🚛
import ods
cluster = ods.dask("dask-1", worker_num=10)
from dask.distributed import Client
client = Client(cluster)
Ray
Coming soon 🚛
import ods
ods.ray(cluster_name="")
Get involved
Open data studio is an open source project. Please give us feedback and feel free to get involved!
- Feedbacks, questions - ods issue tracker
- Staroid public dev roadmap
Commercial support
Staroid actively contributes to Open data studio and provides commercial support. Please contact.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.