Skip to main content

Multimodal AI services & pipelines with cloud-native stack: gRPC, Kubernetes, Docker, OpenTelemetry, Prometheus, Jaeger, etc.

Project description

Jina-Serve

PyPI PyPI - Downloads from official pypistats Github CD status

Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.

Key Features

  • Native support for all major ML frameworks and data types
  • High-performance service design with scaling, streaming, and dynamic batching
  • LLM serving with streaming output
  • Built-in Docker integration and Executor Hub
  • One-click deployment to Jina AI Cloud
  • Enterprise-ready with Kubernetes and Docker Compose support
Comparison with FastAPI

Key advantages over FastAPI:

  • DocArray-based data handling with native gRPC support
  • Built-in containerization and service orchestration
  • Seamless scaling of microservices
  • One-command cloud deployment

Install

pip install jina

See guides for Apple Silicon and Windows.

Core Concepts

Three main layers:

  • Data: BaseDoc and DocList for input/output
  • Serving: Executors process Documents, Gateway connects services
  • Orchestration: Deployments serve Executors, Flows create pipelines

Build AI Services

Let's create a gRPC-based AI service using StableLM:

from jina import Executor, requests
from docarray import DocList, BaseDoc
from transformers import pipeline


class Prompt(BaseDoc):
    text: str


class Generation(BaseDoc):
    prompt: str
    text: str


class StableLM(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.generator = pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        generations = DocList[Generation]()
        prompts = docs.text
        llm_outputs = self.generator(prompts)
        for prompt, output in zip(prompts, llm_outputs):
            generations.append(Generation(prompt=prompt, text=output))
        return generations

Deploy with Python or YAML:

from jina import Deployment
from executor import StableLM

dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)

with dep:
    dep.block()
jtype: Deployment
with:
 uses: StableLM
 py_modules:
   - executor.py
 timeout_ready: -1
 port: 12345

Use the client:

from jina import Client
from docarray import DocList
from executor import Prompt, Generation

prompt = Prompt(text='suggest an interesting image generation prompt')
client = Client(port=12345)
response = client.post('/', inputs=[prompt], return_type=DocList[Generation])

Build Pipelines

Chain services into a Flow:

from jina import Flow

flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)

with flow:
    flow.block()

Scaling and Deployment

Local Scaling

Boost throughput with built-in features:

  • Replicas for parallel processing
  • Shards for data partitioning
  • Dynamic batching for efficient model inference

Example scaling a Stable Diffusion deployment:

jtype: Deployment
with:
 uses: TextToImage
 timeout_ready: -1
 py_modules:
   - text_to_image.py
 env:
  CUDA_VISIBLE_DEVICES: RR
 replicas: 2
 uses_dynamic_batching:
   /default:
     preferred_batch_size: 10
     timeout: 200

Cloud Deployment

Containerize Services

  1. Structure your Executor:
TextToImage/
├── executor.py
├── config.yml
├── requirements.txt
  1. Configure:
# config.yml
jtype: TextToImage
py_modules:
 - executor.py
metas:
 name: TextToImage
 description: Text to Image generation Executor
  1. Push to Hub:
jina hub push TextToImage

Deploy to Kubernetes

jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s

Use Docker Compose

jina export docker-compose flow.yml docker-compose.yml
docker-compose up

JCloud Deployment

Deploy with a single command:

jina cloud deploy jcloud-flow.yml

LLM Streaming

Enable token-by-token streaming for responsive LLM applications:

  1. Define schemas:
from docarray import BaseDoc


class PromptDocument(BaseDoc):
    prompt: str
    max_tokens: int


class ModelOutputDocument(BaseDoc):
    token_id: int
    generated_text: str
  1. Initialize service:
from transformers import GPT2Tokenizer, GPT2LMHeadModel


class TokenStreamingExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = GPT2LMHeadModel.from_pretrained('gpt2')
  1. Implement streaming:
@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
    input = tokenizer(doc.prompt, return_tensors='pt')
    input_len = input['input_ids'].shape[1]
    for _ in range(doc.max_tokens):
        output = self.model.generate(**input, max_new_tokens=1)
        if output[0][-1] == tokenizer.eos_token_id:
            break
        yield ModelOutputDocument(
            token_id=output[0][-1],
            generated_text=tokenizer.decode(
                output[0][input_len:], skip_special_tokens=True
            ),
        )
        input = {
            'input_ids': output,
            'attention_mask': torch.ones(1, len(output[0])),
        }
  1. Serve and use:
# Server
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
    dep.block()


# Client
async def main():
    client = Client(port=12345, protocol='grpc', asyncio=True)
    async for doc in client.stream_doc(
        on='/stream',
        inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
        return_type=ModelOutputDocument,
    ):
        print(doc.generated_text)

Support

Jina-serve is backed by Jina AI and licensed under Apache-2.0.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jina-3.34.0.tar.gz (378.9 kB view details)

Uploaded Source

Built Distributions

jina-3.34.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

jina-3.34.0-cp311-cp311-macosx_11_0_arm64.whl (8.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

jina-3.34.0-cp311-cp311-macosx_10_9_x86_64.whl (9.0 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

jina-3.34.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

jina-3.34.0-cp310-cp310-macosx_11_0_arm64.whl (8.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

jina-3.34.0-cp310-cp310-macosx_10_9_x86_64.whl (9.0 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

jina-3.34.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

jina-3.34.0-cp39-cp39-macosx_11_0_arm64.whl (8.4 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

jina-3.34.0-cp39-cp39-macosx_10_9_x86_64.whl (9.0 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file jina-3.34.0.tar.gz.

File metadata

  • Download URL: jina-3.34.0.tar.gz
  • Upload date:
  • Size: 378.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for jina-3.34.0.tar.gz
Algorithm Hash digest
SHA256 2f7a485677ebe82c9592eb3d59e9777ffcbf3049b6b6d1a6f8cfdad1ee96e0a5
MD5 c83f1c655c3de02738abc34cdb6cb42f
BLAKE2b-256 ac40833ec94df33aa7f4725d0769ca076437d4f8992f0341808032ff10626fb5

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c5fe00a591afcda544c3b2eea1e34c4a460d67aa9c4a5a5b7784c17c7f6512ca
MD5 e512dc1e68c6cc00feaf84e4b1707bac
BLAKE2b-256 c5c6e6292db78ceb5fe79ceb42a4e2ce065014d7584e3c6f2d302318ef89dd93

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2a0e693f058a2d7341fb088f26471bed4cb0b4511971a6c650d5bf98ea6b4dea
MD5 37f2244726ee3a3bf9818aef490ab1b0
BLAKE2b-256 17fdbd1cb9f17603360f826a1530339dc384e2c0d193bf45151ec1c3ba2f82d3

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 aa38560538af23110d04d56e99c4348a0ca283ca3f6326c8af920c45751d585d
MD5 14093f85e5b55628273755a87509b432
BLAKE2b-256 7795649f68f55e372aa7e75e5e3c874789f5f9fc238d0ca1f698d762ae61e06d

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e0cc23d00e3872574fd700b9904b91ea850ee796035fd9a8465e9de2978fa2f
MD5 a83ac80fd2f868be4226ca416215ca13
BLAKE2b-256 df93e4bdaf545bf35bbd460d9c40f489673acc799d175d5c035dbd51fcbacfa5

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9346acb3efa2a58819feb427562418f436eaa350571d9f9f2fad04d8d167c47c
MD5 b7551af4b45c6a1365e0f8a9229d4704
BLAKE2b-256 0dd560ff27eddeb30ad039271c053a001364bc7a51ea4ab6aeb59e20fc5cc2cb

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b2ebd1405f528c668fc9f1547bd0be08d3cc9e7131be05e9d956ec1406a8a44a
MD5 a98e8e667cf980c3b114222647b04666
BLAKE2b-256 c540c625bab7c541ff8efedcaa02088250b9ee233958d6e8d26c02e4f79e062b

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4621cc1ce0873969d7691af446d065401546c6dd0a6752169d96019b5ecd7c09
MD5 e05f5da15c6070f8f22b34adb07b0d3f
BLAKE2b-256 465a681ecf250cd66a7b0d57526c94817604d792cd7409a73e9fa28d5c6630c9

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: jina-3.34.0-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 8.4 MB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for jina-3.34.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7dcf83dd32f2ea6d9be7a68ac0b3caacddead1179329c8dfbcb0c5e2f3d6e5de
MD5 d22ad3d7fca021cd43402d23a4ecb28b
BLAKE2b-256 40d22f9bcabf1c7cbce115748b6287c5c9db98fd68eeb242c081c18787356b9c

See more details on using hashes here.

File details

Details for the file jina-3.34.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for jina-3.34.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4170a1c6e409cae6d95105caa55f770caeebaa698eb25a573701cfcf1a1061ed
MD5 30be2488421ffadc9db83c0ca88aa4c8
BLAKE2b-256 d32f64fcdf0de7fc860fde22214052973d3abae7cf994f8e0ec0d61b6dda1956

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page