Skip to main content

Jina is geared towards building search-as-a-service systems for any kind of data in just minutes.

Project description

Jina banner

Cloud-Native Neural Search[?] Framework for Any Kind of Data

Python 3.7 3.8 3.9 Docker Image Version (latest semver) codecov

Jina allows you to build deep learning-powered search-as-a-service in just minutes.

🌌 All data type - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.

🌩️ Fast & cloud-native - Distributed architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, streaming, paralleling, sharding, async scheduling with REST/gRPC/WebSocket.

⏱️ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

🍱 Own your stack - Keep an end-to-end stack ownership of your solution, avoid integration pitfalls with fragmented, multi-vendor, generic legacy tools.

Run Quick Demo

  • 👗 Fashion image search: pip install --pre && jina hello fashion
  • 🤖 QA chatbot: pip install --pre "jina[chatbot]" && jina hello chatbot
  • 📰 Multimodal search: pip install --pre "jina[multimodal]" && jina hello multimodal
  • 🍴 Fork the source of a demo to your folder: jina hello fork fashion ../my-proj/

Install

2.0 is in pre-release, add --pre to install it. Why 2.0?

$ pip install --pre jina
$ jina -v
2.0.0rcN

via Docker

$ docker run jinaai/jina:master -v
2.0.0rcN
📦 More installation options

x86/64,arm64,v6,v7,Apple M1
On Linux/macOS & Python 3.7/3.8/3.9 Docker Users
Standard pip install --pre jina docker run jinaai/jina:master
Daemon pip install --pre "jina[daemon]" docker run --network=host jinaai/jina:master-daemon
With Extras pip install --pre "jina[devel]" docker run jinaai/jina:master-devel

Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.

Get Started

Document, Executor, and Flow are the three fundamental concepts in Jina.

Copy-paste the minimum example below and run it:

💡 Preliminaries: character embedding, pooling, Euclidean distance

Get started system diagram

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embeddings from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embeddings from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.score.value)  # sort matches by their values

f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of this file
    f.block()  # block for listening request

Keep the above running and start a simple client:

from jina import Client, Document

def print_matches(req):  # the callback function invoked when task is done
    for idx, d in enumerate(req.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.score.value:2f}: "{d.text}"')
        
c = Client(host='localhost', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

It finds the lines most similar to "request(on=something)" from the server code snippet and prints the following:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

😔 Doesn't work? Our bad! Please report it here.

Read Tutorials

Support

Join Us

Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorflow-search-0.0.1.tar.gz (272.4 kB view details)

Uploaded Source

File details

Details for the file tensorflow-search-0.0.1.tar.gz.

File metadata

  • Download URL: tensorflow-search-0.0.1.tar.gz
  • Upload date:
  • Size: 272.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.7.9

File hashes

Hashes for tensorflow-search-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9e512785fb9ad1b4ca9231f8e4c6a229c130353aca4e3e7aa854dce7299452f1
MD5 def2334e02b5d5aacab94cf31249d427
BLAKE2b-256 fad90f361af1e7544bceaea70623d3b39bce4f75ea715dbc8c612553db877ad1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page