Skip to main content

Enscale: An instant distributed computing library based on Ray stack

Project description



The project is currently under heavy development, and focusing on PyTorch and the recommendation scenario.

About

Enscale is an instant distributed computing library based on the Ray Train and Ray Data, which is scalable, efficient, and easy-to-use. It accelerates the development of any ML/DL training workload, on any cloud or local, at any parallelism size.

Goals

  • Launch any interactive ML/DL workloads instantly on your laptop or to any cloud
  • Scale your own single-machine neural network modules to a native distributed manner
  • Apply heterogeneous architecture
  • Data scientist-friendly API
  • Sparse and dense feature definitions

Non-Goals

  • Not a neural network library, there are only some benchmark modules provided.

Getting Started

Prerequisites

  • Python version >= 3.7.1
  • Packages
    • requests >= 2.26.0
    • ray >= 1.9.2
    • torch >= 1.9.1
    • pandas >= 1.3.5
    • pyarrow >= 6.0.1

Installation

Using Pre-compiled Wheels

# CPU version
pip install enscale

From Source

git clone https://github.com/ryantd/enscale
cd enscale
pip install -e .

Runtime environment

The library can launch locally or on any cloud provider with Ray set up.

  • If you want to launch on the cloud, go through this doc to set up your Ray Cluster. And then you can use environ_validate(n_cpus=N, cluster_endpoint="ray://<head_node_host>:<port>") to connect your cluster.
  • Or just use environ_validate(n_cpus=N) to have a local experience.

You can add more native ray.init arguments, just put them into environ_validate call. Like environ_validate(n_cpus=N, ignore_reinit_error=True) to make Ray suppresses errors from calling ray.init() a second time.

Lightning example

See more hands-on and advanced examples here, like heterogeneous support and sparsity definition.

The following example requires sklearn to be installed. And tqdm is optional, which enables progress reporting.

Open In Colab

import torch
import torch.nn as nn
from sklearn.metrics import roc_auc_score
from enscale.util import pprint_results, load_benchmark_dataset
from enscale.model.ctr import DeepFM
from enscale import NeuralNetTrainer, environ_validate

N_WORKERS = 2
N_DATA_PROCESSOR = 1

# ray environment setup
environ_validate(n_cpus=N_DATA_PROCESSOR + N_WORKERS)
# load dataset and sparsity definition pre-defined
datasets, feature_defs, dataset_options = load_benchmark_dataset(
    # set your own dataset by `data_path="criteo_mini.txt"`
    separate_valid_dataset=False
)
# trainer setup
trainer = NeuralNetTrainer(
    # module and dataset configs
    module=DeepFM, # your own nn.Module or built in modules
    module_params={
        "dense_feature_defs": feature_defs["dense"],
        "sparse_feature_defs": feature_defs["sparse"],
    },
    dataset=datasets,
    dataset_options=dataset_options,
    # trainer configs
    epochs=5,
    batch_size=512,
    loss_fn=nn.BCELoss(),
    optimizer=torch.optim.Adam,
    metric_fns=[roc_auc_score],
    # logger callbacks
    callbacks=["json"],
    # computation abstract on distributed
    num_workers=N_WORKERS,
)
# run and print results
results = trainer.run()
pprint_results(results)

Architecture

arch

Roadmap

  • Heterogeneous Strategy on Distributed Training
    • Sync Parameter Server
    • Aync Parameter Server
    • Hybird Phase 1: use sync or async for the dense or sparse component as you like, under homogeneous architecture
    • Hybird Phase 2: you can choose async PS for the sparse component, and sync Ring Allreduce (like PyTorch's DDP) for the dense component
  • Framework Support
    • PyTorch: no specific plan to support other frameworks
  • Advanced Parallel Mechanism
  • Accelerator Support
    • GPU: complete inspection required

Reference

  • Ray and Ray Train: Ray Train is a lightweight library for distributed deep learning, allowing you to scale up and speed up training for your deep learning models. Docs here.
  • DeepCTR-Torch: Easy-to-use, modular and extendible package of deep-learning based CTR models.

License

Enscale is MIT licensed, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enscale-0.0.3.post4.tar.gz (24.5 kB view details)

Uploaded Source

File details

Details for the file enscale-0.0.3.post4.tar.gz.

File metadata

  • Download URL: enscale-0.0.3.post4.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.0

File hashes

Hashes for enscale-0.0.3.post4.tar.gz
Algorithm Hash digest
SHA256 541a18b37876c0d808d1d37b2c93c3440986946db4fc6c53eca1f2d5b02824e4
MD5 99142fdb3cd8994fa7d14aea252be377
BLAKE2b-256 db0b2d9e46e8ca790ae88cf573ca28078661a8d40c2521d7daa55fe890c75370

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page