JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Project description
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.
About
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
JetStream Engine Implementation
Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.
Jax
- Git: https://github.com/google/maxtext
- README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md
Pytorch
- Git: https://github.com/google/jetstream-pytorch
- README: https://github.com/google/jetstream-pytorch/blob/main/README.md
Documentation
- Online Inference with MaxText on v5e Cloud TPU VM [README]
- Online Inference with Pytorch on v5e Cloud TPU VM [README]
- Serve Gemma using TPUs on GKE with JetStream
- Observability in JetStream Server
- Profiling in JetStream Server
- JetStream Standalone Local Setup
JetStream Standalone Local Setup
Getting Started
Setup
pip install -r requirements.txt
Run local server & Testing
Use the following commands to run a server locally:
# Start a server
python -m jetstream.core.implementations.mock.server
# Test local mock server
python -m jetstream.tools.requester
# Load test local mock server
python -m jetstream.tools.load_tester
Test core modules
# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator
# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server
# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine
# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
google_jetstream-0.2.2.tar.gz
(51.5 kB
view details)
Built Distribution
File details
Details for the file google_jetstream-0.2.2.tar.gz
.
File metadata
- Download URL: google_jetstream-0.2.2.tar.gz
- Upload date:
- Size: 51.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ea3d238cbb2515cd21e2d2753453fdf505e2dc635b81cc159c08161fdad95ef |
|
MD5 | d66ddc697be003bab7f825a1bdb422b2 |
|
BLAKE2b-256 | e91088c13224cdcabdd7e6a39352f9f345ccb0381aff252bee6341fd71dcc745 |
File details
Details for the file google_jetstream-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: google_jetstream-0.2.2-py3-none-any.whl
- Upload date:
- Size: 72.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4372f6efbc9cfb7d88127d0c42f6efa89bdb754e2a943df2638fe077900606c |
|
MD5 | bbb1ee9717cb79e538c40cc848d4fa68 |
|
BLAKE2b-256 | 7550b7d5ccf7cb3863718dfebe6641973ffd7720a8a4ed22ae4db45b7a2c2954 |