Skip to main content

Jina is the cloud-native neural search framework for any kind of data

Project description

Jina logo: Jina is a cloud-native neural search framework

Cloud-Native Neural Search? Framework for Any Kind of Data

Python 3.7 3.8 3.9 PyPI Docker Image Version (latest semver) codecov

Jina🔊 is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.

🌌 All data types - Scalable indexing, querying, understanding of any data: video, image, long/short text, music, source code, PDF, etc.

⏱️ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

🌩️ Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy containerization, streaming, sharding, replication, async scheduling, HTTP/gRPC/WebSocket protocols.

🍱 Own your stack - Keep end-to-end stack ownership of your solution, avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.

Install

  • via PyPI: pip install jina
  • via Conda: conda install jina -c conda-forge
  • via Docker: docker run jinaai/jina:latest
  • More install options

Documentation

Run Quick Demo

Build Your First Jina App

Document, Executor, and Flow are three fundamental concepts in Jina.

Leveraging these three components, let's build an app that find lines from a code snippet that are most similar to the query.

💡 Preliminaries: character embedding , pooling , Euclidean distance 📗 Read our docs for details

1️⃣ Copy-paste the minimum example below and run it:

The architecture of a simple neural search system powered by Jina
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests


class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling


class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        docs.match(self._docs, metric='euclidean')


f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, replicas=2).add(
    uses=Indexer)  # build a Flow, with 2 replica CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of _this_ file
    f.block()  # block for listening request

2️⃣ Open http://localhost:12345/docs (an extended Swagger UI) in your browser, click /search tab and input:

{
  "data": [
    {
      "text": "@requests(on=something)"
    }
  ]
}

That means, we want to find lines from the above code snippet that are most similar to @request(on=something). Now click Execute button!

Jina Swagger UI extension on visualizing neural search results

3️⃣ Not a GUI fan? Let's do it in Python then! Keep the above server running and start a simple client:

from jina import Client, Document
from jina.types.request import Response


def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclidean"].value:2f}: "{d.text}"')


c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

This prints the following results:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.218218: "from jina import Document, DocumentArray, Executor, Flow, requests"

😔 Doesn't work? Our bad! Please report it here.

Support

Join Us

Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

<img src="https://avatars.githubusercontent

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jina-2.4.6.dev13.tar.gz (346.9 kB view details)

Uploaded Source

File details

Details for the file jina-2.4.6.dev13.tar.gz.

File metadata

  • Download URL: jina-2.4.6.dev13.tar.gz
  • Upload date:
  • Size: 346.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for jina-2.4.6.dev13.tar.gz
Algorithm Hash digest
SHA256 0f00e911fce3bf398803f6c13b445f9294f9bee68fc8ca6bfee84acc3efc8ae9
MD5 a470cfcaf1f69779008c1ef17b8e4cfe
BLAKE2b-256 17aa12f0676e55b4f790c3b8117fdd7045ac016e8f163a7cc3fd3956bd25e975

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page