A minimal alternative to Ray for distributed data processing on EC2 instances
Project description
poormanray
A minimal alternative to Ray for distributed data processing on EC2 instances. Manage clusters, run commands, and distribute jobs without the complexity of a full Ray deployment.
Installation
Requires Python 3.10+.
# Install as a CLI tool (recommended)
uv tool install poormanray
# Or install as a library
uv pip install poormanray
pip install poormanray
Quick Start
# Create a cluster of 5 instances
pmr create --name mycluster --number 5 --instance-type i4i.2xlarge
# List instances in the cluster
pmr list --name mycluster
# Run a command on all instances
pmr run --name mycluster --command "echo 'Hello from $(hostname)'"
# Terminate the cluster when done
pmr terminate --name mycluster
Prerequisites
- AWS credentials configured via:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS CLI (
aws configure) - Credentials file (
~/.aws/credentials)
- Environment variables (
- SSH key pair in
~/.ssh/(id_rsa, id_ed25519, etc.)
Commands
Cluster Management
create - Launch EC2 instances
pmr create --name mycluster --number 5 --instance-type i4i.2xlarge
# Options:
# -n, --name Cluster name (required)
# -N, --number Number of instances (default: 1)
# -t, --instance-type EC2 instance type (default: i4i.xlarge)
# -r, --region AWS region (default: us-east-1)
# -a, --ami-id Custom AMI ID (default: Amazon Linux 2023)
# -d, --detach Don't wait for instances to be ready
# --zone Availability zone
# --storage-type EBS volume type (gp3, gp2, io1, io2, st1, sc1)
# --storage-size Root volume size in GB
# --storage-iops IOPS for the root volume
list - Show cluster instances
pmr list --name mycluster
# Output includes: instance ID, name, type, state, IP, status checks
terminate - Destroy instances
pmr terminate --name mycluster
# Terminate specific instances only:
pmr terminate --name mycluster -i i-abc123 -i i-def456
pause / resume - Stop and start instances
pmr pause --name mycluster # Stop instances (preserves EBS)
pmr resume --name mycluster # Start stopped instances
Command Execution
run - Execute commands on instances
# Run a command
pmr run --name mycluster --command "df -h"
# Run a script
pmr run --name mycluster --script ./my-script.sh
# Run in background (detached)
pmr run --name mycluster --command "long-running-job.sh" --detach
# Auto-terminate after command completes
pmr run --name mycluster --command "./job.sh" --spindown
map - Distribute scripts across instances
Distributes a directory of scripts evenly across all instances and runs them in parallel.
# Create scripts directory with executable scripts
ls scripts/
# job_001.sh job_002.sh job_003.sh job_004.sh job_005.sh
# Distribute and run across cluster
pmr map --name mycluster --script scripts/
# Scripts are distributed round-robin and executed in parallel
Instance Setup
setup - Configure AWS credentials
Copies your AWS credentials to all instances in the cluster.
pmr setup --name mycluster
setup-d2tk - Install Dolma2 Toolkit
Sets up RAID drives, installs Rust, and builds datamap-rs and minhash-rs.
pmr setup-d2tk --name mycluster --detach
setup-dolma-python - Install Dolma Python
Installs Python 3.12, uv, and the dolma package.
pmr setup-dolma-python --name mycluster --detach
setup-decon - Install DECON toolkit
Sets up the DECON pipeline with Rust toolchain.
pmr setup-decon --name mycluster --github-token ghp_xxx --detach
Common Options
These options are available on most commands:
| Option | Short | Description |
|---|---|---|
--name |
-n |
Cluster name (required) |
--region |
-r |
AWS region (default: us-east-1) |
--instance-id |
-i |
Target specific instance(s), repeatable |
--ssh-key-path |
-k |
Path to SSH private key |
--detach |
-d |
Run in background |
--owner |
-o |
Owner tag for cost tracking |
How It Works
-
Instance Tagging: Instances are tagged with
Project(cluster name) andContact(owner) for easy identification and cost tracking. -
SSH Key Management: Your local SSH key is automatically imported to EC2 when creating instances.
-
Remote Execution: Commands are executed over SSH using paramiko. Long-running commands use GNU screen for detached execution.
-
Script Distribution: The
mapcommand base64-encodes scripts, transfers them to instances, and executes them in parallel.
Examples
Data Processing Pipeline
# 1. Create a cluster
pmr create --name dataproc --number 10 --instance-type i4i.4xlarge
# 2. Set up the environment
pmr setup-dolma-python --name dataproc --detach
# 3. Distribute processing scripts
pmr map --name dataproc --script ./processing-jobs/
# 4. Monitor progress
pmr run --name dataproc --command "tail -f ~/*/run_all.log"
# 5. Clean up
pmr terminate --name dataproc
Quick One-Off Command
# Create, run, and terminate in one go
pmr create --name quickjob --number 1
pmr run --name quickjob --command "./my-job.sh" --spindown
# Instance auto-terminates after job completes
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file poormanray-0.1.0.tar.gz.
File metadata
- Download URL: poormanray-0.1.0.tar.gz
- Upload date:
- Size: 8.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1a8edf4fff361128bcb11e83f312c7f14fbc90984b7d2106fe279fee2012a9
|
|
| MD5 |
3366a8d40ef411625ab776e0a8602b99
|
|
| BLAKE2b-256 |
e08c0218248d8366e19b43a924ab89c3d6d48ca2d0ac8ca531b490f5d69ba86d
|
Provenance
The following attestation bundles were made for poormanray-0.1.0.tar.gz:
Publisher:
publish.yml on allenai/poormanray
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
poormanray-0.1.0.tar.gz -
Subject digest:
6e1a8edf4fff361128bcb11e83f312c7f14fbc90984b7d2106fe279fee2012a9 - Sigstore transparency entry: 872464665
- Sigstore integration time:
-
Permalink:
allenai/poormanray@7e8356a1abbed555d82de3f856e5d91fc63ceb26 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/allenai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7e8356a1abbed555d82de3f856e5d91fc63ceb26 -
Trigger Event:
release
-
Statement type:
File details
Details for the file poormanray-0.1.0-py3-none-any.whl.
File metadata
- Download URL: poormanray-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4505258aa123f55b042d99387a415a4390eccaffeba2ae76a38424de12387a4c
|
|
| MD5 |
af6fedd6d0c2986b835a1fc309fb1a64
|
|
| BLAKE2b-256 |
3de135af77655ec4d43ae53da05bd33d9c54ee625a694beaefbebd54141d1b6d
|
Provenance
The following attestation bundles were made for poormanray-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on allenai/poormanray
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
poormanray-0.1.0-py3-none-any.whl -
Subject digest:
4505258aa123f55b042d99387a415a4390eccaffeba2ae76a38424de12387a4c - Sigstore transparency entry: 872464675
- Sigstore integration time:
-
Permalink:
allenai/poormanray@7e8356a1abbed555d82de3f856e5d91fc63ceb26 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/allenai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7e8356a1abbed555d82de3f856e5d91fc63ceb26 -
Trigger Event:
release
-
Statement type: