Skip to main content

Client library for the NNext vector search engine

Project description

Apollo Client

About

This repository houses the source code for pynnext, the python client associated with NNext server.

NNext is a

  • ⚡ blazingly fast
  • 📖 source-available [Elastic License 2.0]
  • 🔍 nearest-neighbors vector search engine

Installation | Quick Start | Documentation | Contributing

Installation

You will need to setup and run an NNext server to utilize this client, or outsource server management to someone else. For detailed server installation instructions, please see the nnext repo.

To install the pynnext client, activate a virtual environment, and install via pip:

  1. In editable mode from source:
git clone https://github.com/nnext-ai/pynnext
cd pynnext
pip install -e .

or

  1. From PyPI:
pip install nnext

Quick Start

Here's a quick example showcasing how you can create an index, insert vectors/documents and search among them via NNext.

Let's begin by installing the NNext server.

NNEXT_PKG=nnext-0.0.12-amd64.deb
NNEXT_URL=https://trove.nnext.io/downloads
wget $NNEXT_URL/$NNEXT_PKG
wget $NNEXT_URL/$NNEXT_PKG.sha512
shasum -a 512 -c $NNEXT_PKG.sha512
sudo dpkg -i $NNEXT_PKG

Run nnext

sudo nnext

You should see something like this:

...
...
[2022-04-27 13:02:10.029] [info] 🏁 Started NNext at  127.0.0.1:6040

Install the Python client for NNext:

pip install nnext

Finally, to follow the examples below, install numpy:

  1. From the souce here
pip install -r requirements.txt

or

  1. Via PyPi
pip install numpy==1.22.3

We can now initialize the client and create a index:

import numpy as np
import nnext
from nnext import _and, _eq, _gte, _in

# Create and initialize the vector client.
nnclient = nnext.Client(
    nodes=[
    {'host': 'localhost', 'port': '6040'}
  ])

Broadly speaking, you can create two types of indices

1. Simple indices

n_dim = 768

# Create an vector index.
nnindex = nnclient.index.create(
    d=n_dim,
    name='test_index')

# Insert vectors into the index.
n_vecs = 1000
vectors = np.random.rand(n_vecs, n_dim)
nnindex.add(vectors)

# Create a query vector set.
n_queries = 10
q_vectors = np.random.rand(n_queries, n_dim)

# Search for the nearest neighbors of the
# query set among the indexed vectors.
k = 5
_idx, _res = nnindex.search(q_vectors, k, return_vector=True)

# The search operation returns a 2d list of the indices of the nearest neighbors
# for each vector in the query set (i.e. a nested list with shape (n_queries, k)),
# and optionally the data associated with the neighbor vectors themselves 
# (i.e. a nested list of shape (n_queries, k, n_dim))
assert len(_idx) == n_queries
assert len(_idx[0]) == k
assert len(_res) == n_queries
assert len(_res[0]) == k
assert len(_res[0][0]) == n_dim

2. Compound indices

🚧 WIP 🚧.

Not implemented.

NNext is capable of storing additional metadata related to your vectors in a rich format. In this example we will use the movie plots dataset from Kaggle.

nnindex = client.index.create({
  "name": "movies",
  "schema": {
      "id" : "string", #⬅ inferred primary key
      "title" : "string",
      "released_year" : "int32",
      "genre" :  "float",
      "wikipage" : "string",
      "plot" : "string",
      "rating" :  "float"
  },
  "index_type": "approximated", #⬅ indexes assumed to be approximated by default.
  "dims": n_dim
})

Now, let's add a vector to the collection we just created:

vector = {
 "id": "124",
 "company_name": "Stark Industries",
 "num_employees": 5215,
 "country": "USA",
}

nnindex.documents.create(document)

Finally, let's search for the document we just indexed:

q_filter = {
    _and: [
        { "Release Year": { _gte: 2015 } },
        { "Genre": { _eq: "comedy" } },
        { "actors": { _in: ["Russell Crowe"] } }
    ]
}

client.collections['companies'].documents.search(search_parameters)

Documentation

All NNext Server and Client documentation, including pynnext integration articles and helpful recipes, can be found at:

🚧 WIP 🚧
https://nnext.ai/docs/

FAQs

How does this differ from Faiss, ScaNN and Annoy?

First of all, NNext uses Faiss under the hood. All of these libraries have python packages installable via PIP or Conda, and those are very easy to use, from install to the API. However, while they allow you to quickly get started, they don't allow for persistence, index growth or high availability. If your application goes down for whatever reason, so do your search indices and data.

How does this differ from Milvus?

Milvus is a large piece of software, that takes a non-trivial amount of effort to setup, administer, scale and fine-tune. It offers you a few thousand configuration parameters that may need to be tuned to get to your ideal configuration. As a result, it's better suited for large teams who have the bandwidth to get it production-ready, and regularly monitor it and scale it, especially when they have a need to store billions of documents and petabytes of data (eg: logs).

NNext is built specifically for decreasing the "time to market" for a delightful nearest-neighbor search experience. It is a light-weight yet powerful & scaleable alternative that focuses on Developer Happiness and Experience with a clean well-documented API, clear semantics and smart defaults so it just works well out-of-the-box, without you having to turn many knobs.

See a side-by-side feature comparison here.

How does this differ from other fully managed solutions like Pinecone?

In brief - **no vendor lock-in**. Tired of using NNext cloud? Pack up your vectors and go. Obviously we don't want you to go, but if you have to, NNext Cloud allows you to download a compressed zip file containing the latest backup of your vectors to your machine. These vectors can then be used with another installation of NNext on premise or on another cloud provider.

Pinecone is a proprietary, hosted, nearest-neighbour search-as-a-service product that works well, when cost is not an issue. However, fast growing applications will quickly run into search & indexing limits, accompanied by expensive plan upgrades as they scale.

NNext on the other hand is an open-source product that you can run on your own infrastructure or use our managed SaaS offering - NNext Cloud. The open source version is free to use (besides of course your own infra costs). With NNext Cloud we do not charge by records or search operations. Instead, you get a dedicated cluster and you can throw as much data and traffic at it as it can handle. You only pay a fixed hourly cost & bandwidth charges for it, depending on the configuration your choose, similar to most modern cloud platforms.

From a product perspective, NNext is closer in spirit to Jina.ai than Pinecone.

See a side-by-side feature comparison here.

Why the Elastic License 2.0?

NNext Server is **source available**, **server software** and we expect users to typically run it as a separate daemon, and not integrate it with their own code. Elastic Licence 2.0 (EL2) covers and allows for this use case **generously**. We aim to set the minimum limitations necessary to strike a fair balance between freedom to use, share and change the software, and preventing actions that will harm the community.

If you have specifics that prevent you from using NNext due to a licensing issue, we're happy to explore this topic further with you. Please reach out to us legal@nnext.ai.

I heard Elasticsearch and OpenSearch were planning on implementing ANN Search?

Fundamentally, Elasticsearch and it's variants, run on the JVM, which by itself can be quite an effort to tune to run optimally. NNext, on the other hand, is a single light-weight self-contained native binary, so it's simple to setup and operate. Furthermore, ANN search on Elasticseach runs as a secondary process, a sidecar, which is not natively supported by the main indexing engine.

Who is NNext?

NNext builds open-source ML-Ops software to help make development and deployment of machine learning applications painless.

Contributing

Introduction

First off, 🙏🏾 thank you for considering contributing to nnext. We value community contributions!

How can you help?

You may already know what you want to contribute -- a fix for a bug you encountered, or a new feature your team wants to use.

If you don't know what to contribute, keep an open mind! Here's some examples of helpful contributions that mean less work for you

  • Improving documentation
  • bug triaging
  • writing tutorials

Checkout the guide to contributing to learn more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nnext-0.0.37.tar.gz (56.0 kB view hashes)

Uploaded Source

Built Distribution

nnext-0.0.37-py3-none-any.whl (63.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page