Multimodal AI services & pipelines with cloud-native stack: gRPC, Kubernetes, Docker, OpenTelemetry, Prometheus, Jaeger, etc.
Project description
Jina-Serve
Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.
Key Features
- Native support for all major ML frameworks and data types
- High-performance service design with scaling, streaming, and dynamic batching
- LLM serving with streaming output
- Built-in Docker integration and Executor Hub
- One-click deployment to Jina AI Cloud
- Enterprise-ready with Kubernetes and Docker Compose support
Comparison with FastAPI
Key advantages over FastAPI:
- DocArray-based data handling with native gRPC support
- Built-in containerization and service orchestration
- Seamless scaling of microservices
- One-command cloud deployment
Install
pip install jina
See guides for Apple Silicon and Windows.
Core Concepts
Three main layers:
- Data: BaseDoc and DocList for input/output
- Serving: Executors process Documents, Gateway connects services
- Orchestration: Deployments serve Executors, Flows create pipelines
Build AI Services
Let's create a gRPC-based AI service using StableLM:
from jina import Executor, requests
from docarray import DocList, BaseDoc
from transformers import pipeline
class Prompt(BaseDoc):
text: str
class Generation(BaseDoc):
prompt: str
text: str
class StableLM(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.generator = pipeline(
'text-generation', model='stabilityai/stablelm-base-alpha-3b'
)
@requests
def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
generations = DocList[Generation]()
prompts = docs.text
llm_outputs = self.generator(prompts)
for prompt, output in zip(prompts, llm_outputs):
generations.append(Generation(prompt=prompt, text=output))
return generations
Deploy with Python or YAML:
from jina import Deployment
from executor import StableLM
dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)
with dep:
dep.block()
jtype: Deployment
with:
uses: StableLM
py_modules:
- executor.py
timeout_ready: -1
port: 12345
Use the client:
from jina import Client
from docarray import DocList
from executor import Prompt, Generation
prompt = Prompt(text='suggest an interesting image generation prompt')
client = Client(port=12345)
response = client.post('/', inputs=[prompt], return_type=DocList[Generation])
Build Pipelines
Chain services into a Flow:
from jina import Flow
flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)
with flow:
flow.block()
Scaling and Deployment
Local Scaling
Boost throughput with built-in features:
- Replicas for parallel processing
- Shards for data partitioning
- Dynamic batching for efficient model inference
Example scaling a Stable Diffusion deployment:
jtype: Deployment
with:
uses: TextToImage
timeout_ready: -1
py_modules:
- text_to_image.py
env:
CUDA_VISIBLE_DEVICES: RR
replicas: 2
uses_dynamic_batching:
/default:
preferred_batch_size: 10
timeout: 200
Cloud Deployment
Containerize Services
- Structure your Executor:
TextToImage/
├── executor.py
├── config.yml
├── requirements.txt
- Configure:
# config.yml
jtype: TextToImage
py_modules:
- executor.py
metas:
name: TextToImage
description: Text to Image generation Executor
- Push to Hub:
jina hub push TextToImage
Deploy to Kubernetes
jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s
Use Docker Compose
jina export docker-compose flow.yml docker-compose.yml
docker-compose up
JCloud Deployment
Deploy with a single command:
jina cloud deploy jcloud-flow.yml
LLM Streaming
Enable token-by-token streaming for responsive LLM applications:
- Define schemas:
from docarray import BaseDoc
class PromptDocument(BaseDoc):
prompt: str
max_tokens: int
class ModelOutputDocument(BaseDoc):
token_id: int
generated_text: str
- Initialize service:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
class TokenStreamingExecutor(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.model = GPT2LMHeadModel.from_pretrained('gpt2')
- Implement streaming:
@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
input = tokenizer(doc.prompt, return_tensors='pt')
input_len = input['input_ids'].shape[1]
for _ in range(doc.max_tokens):
output = self.model.generate(**input, max_new_tokens=1)
if output[0][-1] == tokenizer.eos_token_id:
break
yield ModelOutputDocument(
token_id=output[0][-1],
generated_text=tokenizer.decode(
output[0][input_len:], skip_special_tokens=True
),
)
input = {
'input_ids': output,
'attention_mask': torch.ones(1, len(output[0])),
}
- Serve and use:
# Server
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
dep.block()
# Client
async def main():
client = Client(port=12345, protocol='grpc', asyncio=True)
async for doc in client.stream_doc(
on='/stream',
inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
return_type=ModelOutputDocument,
):
print(doc.generated_text)
Support
Jina-serve is backed by Jina AI and licensed under Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for jina-3.27.18-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00921a5aeaf1354bb9efc6c34ec389cdb6f020ce11079c20bbcdc51dfa91ff72 |
|
MD5 | 17a2c1a532f0710bbf39a21347418adb |
|
BLAKE2b-256 | 9cefd7a05ccf2f4396a7435674b94f49975348d80d0de33eef591ebf79b676be |
Hashes for jina-3.27.18-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef8f25ec2f62fad8b40bff851cb3a1dda1f7ab38909330df87f7c3d56e54cf33 |
|
MD5 | 7135b0a4104d7223bf43e89e77197bf8 |
|
BLAKE2b-256 | 7f2d75115c035a54eab8f9e024e76f1e5600d25cae88f4ea77b7e1568c5cec15 |
Hashes for jina-3.27.18-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bbb0aebd462d630e2532fc5c469955a608e2aa729555606646206721501278d |
|
MD5 | 4e5f033ed4ae734a128f1b2348fd0f0f |
|
BLAKE2b-256 | f267f9d44ae8119c01fee1b7533abdec25eb5b6aec3e397f09beaf3cda25c507 |
Hashes for jina-3.27.18-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76cf55ee95bb315c2a91fc6c45c9bcd3430ab1c63c0caf9c2b893ebdb46454ed |
|
MD5 | 39b906f87833646a46e8b614ad4f840e |
|
BLAKE2b-256 | 6645657c16ee7ab1e75069579fdcc98d059912cc3d71424b2fbeeff9ccc4e1db |
Hashes for jina-3.27.18-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 957b9d50289c77c5fcd21d00539c8d14b3765936c27fae99cf35384d16653bd4 |
|
MD5 | d73a1f910d42400b77c9f3549048ee7c |
|
BLAKE2b-256 | 1713e3a2c7a19b4c4503405d6d38d41cc4d5533f47580dbf016f3ef1d5343959 |
Hashes for jina-3.27.18-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90e42aa4ab244a140098c957695d555a4357501e471152291a130da3d487d4f3 |
|
MD5 | b7118ccae424bdc673a4c9c90f2aefe2 |
|
BLAKE2b-256 | 3582e0aa174df22eb8f1a5c9ee77a1b6dec5dfcb5f32577ab6727de4cdfb6a91 |
Hashes for jina-3.27.18-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5d6f19d529cecc923825ba5e2ce7bcfd394893e164c9a4bf2338f05bcdf5c19 |
|
MD5 | 700ea30b70f7465ae0ee6b36659e34a0 |
|
BLAKE2b-256 | 1fb370383a2c46082da9f195386eeb1f702e759ee84e4524364cde90b12a3e99 |
Hashes for jina-3.27.18-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb373148455aa52c6feddc9c72c327a37e532024094f25511285e42af226caaa |
|
MD5 | 760d3874e755dc84a6df08d7927086bc |
|
BLAKE2b-256 | 74b96a152d275a8defe6edba62b7f0dc89f060182ef00b60ceeb20702d25d90f |
Hashes for jina-3.27.18-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9301bb6f427184704b9cca8229cabd8f85d6a653e03de903016d0c9b6f4709b |
|
MD5 | e9bb04773bf579ae85e07283a8c9a9ec |
|
BLAKE2b-256 | 709076a857a1fa77dd5774e5d161c48318c0b14acd3db01cea8bf2d5a9e2f2df |
Hashes for jina-3.27.18-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abf29e980f0eda351180ec7faad8f3a1506a2fc010a0a58ecee7b96d4581316a |
|
MD5 | 1c11b074bee9beb74ed6bd3b3d3b04ff |
|
BLAKE2b-256 | 72912549903df345490a7f878ba8b332d928b8275a96576fd84dde04be31b4d5 |
Hashes for jina-3.27.18-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07c0adc45a0333e8dc94d6a6d6d0a9f9354c391ea83f4e932c5c5fdf19958543 |
|
MD5 | 3bbb4c01c333b254846c677a82e072b7 |
|
BLAKE2b-256 | c3fbfc929c32f9eb67373440558cd3a4164dc91d1244a0b7d02bdd7ec3908ed2 |
Hashes for jina-3.27.18-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51bf1e36f98214ae9a9607f813df047c1d9b2479528b01434c107bf28a934b3b |
|
MD5 | e79d356762726bacb25f6bd419c7f096 |
|
BLAKE2b-256 | 763ffa986886aebae3329323154a32d26808e7f66ced9fc4cc9222ecde08f172 |
Hashes for jina-3.27.18-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a479c90cad703f101325655e25c08aaf2e642a62d8924cab3de21880c12b7f3f |
|
MD5 | 4e1d0af2c8ea2773b061978a8f930236 |
|
BLAKE2b-256 | e3245c7fb46b8e1f47a068aecbe79c4e60bb6cbf4c54129c985d42cad9646770 |