JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Project description
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.
About
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
JetStream Engine Implementation
Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.
Jax
- Git: https://github.com/google/maxtext
- README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md
Pytorch
- Git: https://github.com/google/jetstream-pytorch
- README: https://github.com/google/jetstream-pytorch/blob/main/README.md
Documentation
- Online Inference with MaxText on v5e Cloud TPU VM [README]
- Online Inference with Pytorch on v5e Cloud TPU VM [README]
- Serve Gemma using TPUs on GKE with JetStream
- Observability in JetStream Server
- Profiling in JetStream Server
- JetStream Standalone Local Setup
JetStream Standalone Local Setup
Getting Started
Setup
pip install -r requirements.txt
Run local server & Testing
Use the following commands to run a server locally:
# Start a server
python -m jetstream.core.implementations.mock.server
# Test local mock server
python -m jetstream.tools.requester
# Load test local mock server
python -m jetstream.tools.load_tester
Test core modules
# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator
# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server
# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine
# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file google_jetstream-0.2.2.tar.gz.
File metadata
- Download URL: google_jetstream-0.2.2.tar.gz
- Upload date:
- Size: 51.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ea3d238cbb2515cd21e2d2753453fdf505e2dc635b81cc159c08161fdad95ef
|
|
| MD5 |
d66ddc697be003bab7f825a1bdb422b2
|
|
| BLAKE2b-256 |
e91088c13224cdcabdd7e6a39352f9f345ccb0381aff252bee6341fd71dcc745
|
File details
Details for the file google_jetstream-0.2.2-py3-none-any.whl.
File metadata
- Download URL: google_jetstream-0.2.2-py3-none-any.whl
- Upload date:
- Size: 72.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4372f6efbc9cfb7d88127d0c42f6efa89bdb754e2a943df2638fe077900606c
|
|
| MD5 |
bbb1ee9717cb79e538c40cc848d4fa68
|
|
| BLAKE2b-256 |
7550b7d5ccf7cb3863718dfebe6641973ffd7720a8a4ed22ae4db45b7a2c2954
|