Jina is the cloud-native neural search solution powered by the state-of-the-art AI and deep learning
Project description
Cloud-Native Neural Search Framework for Any Kind of Data
Jina is geared towards building search-as-a-service systems for any kind of data in just minutes.
🌌 Search anything - Large-scale indexing and querying of unstructured data: video, image, long/short text, music, source code, PDF, etc.
⏱️ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.
🍱 Own your stack - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with fragmented, multi-vendor, generic legacy tools.
🌩️ Fast & cloud-native - Distributed architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, sharding, async, REST/gRPC/WebSocket.
Installation
2.0 is still in pre-release, add --pre
to install it. Why 2.0?
$ pip install --pre jina
$ jina -v
2.0.0rcN
via Docker
$ docker run jinaai/jina:master -v
2.0.0rcN
📦 More installation options
x86/64,arm/v6,v7,v8 (Apple M1) |
On Linux/macOS & Python 3.7/3.8/3.9 | Docker Users |
---|---|---|
Standard | pip install --pre jina |
docker run jinaai/jina:master |
Daemon | pip install --pre "jina[daemon]" |
docker run --network=host jinaai/jina:master-daemon |
With Extras | pip install --pre "jina[devel]" |
docker run jinaai/jina:master-devel |
Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.
Get Started
Document, Executor, Flow are three fundamental concepts in Jina.
- 📄 Document is the basic data type in Jina;
- ⚙️ Executor is how Jina processes Documents;
- 🔀 Flow is how Jina streamlines and distributes Executors.
Copy-paste the minimum example below and run it:
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests
class CharEmbed(Executor): # a simple character embedding with mean-pooling
offset = 32 # letter `a`
dim = 127 - offset + 1 # last pos reserved for `UNK`
char_embd = np.eye(dim) * 1 # one-hot embedding for all chars
@requests
def foo(self, docs: DocumentArray, **kwargs):
for d in docs:
r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
d.embedding = self.char_embd[r_emb, :].mean(axis=0) # mean-pooling
class Indexer(Executor):
_docs = DocumentArray() # for storing all document in memory
@requests(on='/index')
def foo(self, docs: DocumentArray, **kwargs):
self._docs.extend(docs) # extend stored `docs`
@requests(on='/search')
def bar(self, docs: DocumentArray, **kwargs):
q = np.stack(docs.get_attributes('embedding')) # get all embedding from query docs
d = np.stack(self._docs.get_attributes('embedding')) # get all embedding from stored docs
euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1) # pairwise euclidean distance
for dist, query in zip(euclidean_dist, docs): # add & sort match
query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]
query.matches.sort(key=lambda m: m.score.value) # sort matches by its value
f = Flow().add(uses=CharEmbed, parallel=2).add(uses=Indexer) # build a flow, with 2 parallel CharEmbed, tho unnecessary
with f:
f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))
def print_matches(req): # the callback function invoked when task is done
for idx, d in enumerate(req.docs[0].matches[:3]): # print top-3 matches
print(f'[{idx}]{d.score.value:2f}: "{d.text}"')
f.post('/search', Document(text='request(on=something)'), on_done=print_matches)
It finds most similar lines to "request(on=something)
" from the above code snippet and prints the following:
[0]0.125791: "f.post('/search', Document(text='request(on=something)'), on_done=print_matches)"
[1]0.168526: "@requests(on='/index')"
[2]0.181676: "@requests(on='/search')"
Run Quick Demo
- 👗 Fashion image search:
jina hello fashion
- 🤖 QA chatbot:
pip install "jina[chatbot]" && jina hello chatbot
- 📰 Multimodal search:
pip install "jina[multimodal]" && jina hello multimodal
Fork Demo & Build Your Own
Copy the source code of a hello world to your own directory and start from there:
$ jina hello fork fashion ../my-proj/
Read Tutorials
- 📄
Document
&DocumentArray
: the basic data type in Jina. - ⚙️
Executor
: how Jina processes Documents. - 🔀
Flow
: how Jina streamlines and distributes Executors. - 🧼 Write clean code in Jina
- 😎 3 Reasons to use Jina 2.0
Support
- Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (calendar link/.ics) and live stream on YouTube)
- Subscribe to the latest video tutorials on our YouTube channel.
Join Us
Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.
Contributing
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.
- Contributing guidelines
- Code of conduct - play nicely with the Jina community
- Good first issues
- Release cycles and development stages
- Upcoming features - what's being planned, what we're thinking about.
<a href="https://github.c
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.