Jina is the cloud-native neural search framework for any kind of data
Project description
Cloud-Native Neural Search? Framework for Any Kind of Data
Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.
⏱️ Save time - The design pattern of neural search systems. Native support for PyTorch/Keras/ONNX/Paddle. Build solutions in just minutes.
🌌 All data types - Process, index, query, and understand videos, images, long/short text, audio, source code, PDFs, etc.
🌩️ Local & cloud friendly - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.
🍱 Own your stack - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.
Install
pip install -U jina
More install options including Conda, Docker, Windows can be found here.
Documentation
Get Started
We promise you can build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.
Basic Concepts 
Document, Executor, and Flow are three fundamental concepts in Jina.
- Document is the basic data type in Jina;
- Executor is how Jina processes Documents;
- Flow is how Jina streamlines and distributes Executors.
Leveraging these three components, let's build an app that find similar images using ResNet50.
ResNet50 Image Search in 20 Lines 
💡 Preliminaries: download dataset, install PyTorch & Torchvision
from jina import DocumentArray, Document
def preproc(d: Document):
return (d.load_uri_to_image_blob() # load
.set_image_blob_normalization() # normalize color
.set_image_blob_channel_axis(-1, 0)) # switch color axis
docs = DocumentArray.from_files('img/*.jpg').apply(preproc)
import torchvision
model = torchvision.models.resnet50(pretrained=True) # load ResNet50
docs.embed(model, device='cuda') # embed via GPU to speedup
q = (Document(uri='img/00021.jpg') # build query image & preprocess
.load_uri_to_image_blob()
.set_image_blob_normalization()
.set_image_blob_channel_axis(-1, 0))
q.embed(model) # embed
q.match(docs) # find top-20 nearest neighbours, done!
Done! Now print q.matches
and you'll see the URIs of the most similar images.
Add three lines of code to visualize them:
for m in q.matches:
m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()
Sweet! FYI, you can use Keras, ONNX, or PaddlePaddle for the embedding model. Jina supports them well.
As-a-Service in 10 Extra Lines 
With an extremely trivial refactoring and ten extra lines of code, you can make the local script a ready-to-serve service:
-
Import what we need.
from jina import Document, DocumentArray, Executor, Flow, requests
-
Copy-paste the preprocessing step and wrap it via
Executor
:class PreprocImg(Executor): @requests def foo(self, docs: DocumentArray, **kwargs): for d in docs: (d.load_uri_to_image_blob() # load .set_image_blob_normalization() # normalize color .set_image_blob_channel_axis(-1, 0)) # switch color axis
-
Copy-paste the embedding step and wrap it via
Executor
:class EmbedImg(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) import torchvision self.model = torchvision.models.resnet50(pretrained=True) @requests def foo(self, docs: DocumentArray, **kwargs): docs.embed(self.model)
-
Wrap the matching step into an
Executor
:class MatchImg(Executor): _da = DocumentArray() @requests(on='/index') def index(self, docs: DocumentArray, **kwargs): self._da.extend(docs) @requests(on='/search') def foo(self, docs: DocumentArray, **kwargs): docs.match(self._da) for d in docs.traverse_flat('r,m'): # only require for visualization d.convert_uri_to_datauri() # convert to datauri d.pop('embedding', 'blob') # remove unnecessary fields for save bandwidth
-
Connect all
Executor
s in aFlow
, scale embedding to 3:f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)
Plot it via
f.plot('flow.svg')
and you get: -
Index image data and serve REST query publicly:
with f: f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8) f.block()
Done! Now query it via curl
and you get the most similar images:
Or go to http://0.0.0.0:12345/docs
and test requests via a Swagger UI:
Or use a Python client to access the service:
from jina import Client, Document
from jina.types.request import Response
def print_matches(resp: Response): # the callback function invoked when task is done
for idx, d in enumerate(resp.docs[0].matches): # print top-3 matches
print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')
c = Client(protocol='http', port=12345) # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)
At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:
✅ Solution as microservices | ✅ Scale in/out any component | ✅ Query via HTTP/WebSocket/gRPC/Client |
✅ Distribute/Dockerize components | ✅ Async/non-blocking I/O | ✅ Extendable REST interface |
Deploy to Kubernetes in 7 Minutes 
Have another seven minutes? We'll show you how to bring your service to the next level by deploying it to Kubernetes.
- Create a Kubernetes cluster and get credentials (example in GCP, more K8s providers here):
gcloud container clusters create test --machine-type e2-highmem-2 --num-nodes 1 --zone europe-west3-a gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
- Move each
Executor
class to a separate folder with one Python file in each:PreprocImg
-> 📁preproc_img/exec.py
EmbedImg
-> 📁embed_img/exec.py
MatchImg
-> 📁match_img/exec.py
- Push all Executors to Jina Hub:
jina hub push preproc_img jina hub push embed_img jina hub push embed_img
You will get three Hub Executors that can be used via Docker container. - Adjust
Flow
a bit and open it:f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg') with f: f.block()
Intrigued? Find more about Jina from our docs.
Run Quick Demo
- 👗 Fashion image search:
jina hello fashion
- 🤖 QA chatbot:
pip install "jina[demo]" && jina hello chatbot
- 📰 Multimodal search:
pip install "jina[demo]" && jina hello multimodal
- 🍴 Fork the source of a demo to your folder:
jina hello fork fashion ../my-proj/
Support
- Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
- Join our Engineering All Hands meet-up to
discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public calendar/.ical/Meetup group) and live stream on YouTube
- Subscribe to the latest video tutorials on our YouTube channel
Join Us
Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.
Contributing
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.