Skip to main content

Python client for NautilusDB

Project description

nautilusdb-python-client

Python client for NautilusDB, a fully-managed, cloud-native vector search service.

NautilusDB is currently in public alpha. We're actively improving the product and releasing new features, and we'd love to hear your feedback! Please take a moment to fill out this feedback form to help us understand your use-case better.

By default, all collections are subject to permanent deletion after 2 weeks. Please let us know if you need to keep it for longer via the feedback form.

NautilusDB python client supports both high-level APIs where you can directly upload files and ask questions, as well as a set of low-level APIs to use it as a vector database to directly manipulate vectors.

Continue reading, or Click here to see high-level API guide.
Click here to see vector database API guide

Quickstart

You can try out NautilusDB in just a few lines of code. We have prepared a special public collection openai-web that can answer questions about the contents of www.openai.com

import nautilusdb as ndb

answer, _ = ndb.collection('openai-web').ask('what is red team?')
print(answer)
"""
Sample answer:

Red team refers to the group of external experts who work with OpenAI to
identify and evaluate potential risks and harmful capabilities in new systems.
The red team's role is to help develop taxonomies of risk and provide input
throughout the model and product development lifecycle.
"""

You can also create your own collections, upload files, then get answers specific to your data assets. The following example walks you through the process of creating a collection and indexing the original transformer paper into that collection.

import nautilusdb as ndb

# Create an API key
my_api_key = ndb.create_api_key()

# Configure ndb to use the newly minted API key
ndb.init(api_key=my_api_key)

# Create a new collection with preconfigured dimension
llm_research = ndb.CollectionBuilder.question_answer(name="llm_research").build()
ndb.create_collection(llm_research)

# Index the original Transformer paper into this collection.
llm_research.upload_document("https://arxiv.org/pdf/1706.03762.pdf")

# Get answers from this paper
llm_research.ask("what is a transformer?")

Installation

Install a released NautilusDB python client from pip.

python3 version >= 3.10 is required

pip3 install nautilusdb-client

Creating an API key

You can create an API key and use it to create or access your own collections.
In public preview, all collections are created inside a shared, public account. Private accounts and related functionalities will be released soon.

import nautilusdb as ndb

# Create a new API key
my_api_key = ndb.create_api_key()

# Please record this API key and keep it a secrete
#
# Collections created with this key can only be accessed
# through this key!
print(my_api_key)

# Use this API key in all subsequent calls
ndb.init(api_key=my_api_key)

Creating a Collection

See this page for a brief overview of NautilusDB data model

You can create a public collection that is accessible to everyone, or a private collection only accessible with a specific API key.

import nautilusdb as ndb

# Set an API key to create a private collection
# Set API key to None to create a public collection
ndb.init(api_key="<my_api_key>")

# Create a collection called c1. c1 is configured to be compatible with 
# Q/A APIs. It has vector embeddings dimension of 1536, contains three metadata
# columns: text (string), tokens (int), filename (string). 
collection = ndb.CollectionBuilder.question_answer('llm_research').build()
ndb.create_collection(collection)

Listing collection

You can see list of collections you have access to. For example, this list will include all public collections, as well as all collections that were created using the currently configured API key.

import nautilusdb as ndb

collections = ndb.list_collections()

Uploading a document

You can upload a local file or a file from a web URL and index it into a collection.

Supported file format

  • .pdf PDF files
  • .txt Plain-text files
  • .md Markdown files
  • .docx Microsoft word documents
import nautilusdb as ndb

# Optional API key to access private collections
ndb.init(api_key="<my_api_key>")

collection = ndb.collection('llm_research')

# Local file and URLs are both supported.
# URL must contain the full scheme prefix (http:// or https://)
collection.upload_document('/path/to/file.pdf')
collection.upload_document('https://path/to/file.pdf')

Asking a question

You can ask questions within a collection. API key is required for private collections only. ask() method returns a plain-text answer to your question, as well as a list of most relevance references used to derive the answer.

Available public collections

  • openai-web: Contains contents of www.openai.com
import nautilusdb as ndb

# Optional API key to access private collections
ndb.init(api_key="<my_api_key>")

# Get a plain text answer, as well as a list of references from the collection
# that are the most relevant to the question.
answer, refs = ndb.collection('openai-web').ask('what is red team?')


answer, refs = ndb.collection('llm_research').ask('what is a transformer?')

Deleting a collection

You can delete a public collection or a private collection using the same API key that was used to create it.

import nautilusdb as ndb

# Optional API key to access private collections
ndb.init(api_key="<my_api_key>")

ndb.delete_collection('llm_research')

Using NautilusDB as a vector database

NautilusDB is a vector database at its core. You can directly manipulate vectors in the database.

Creating a custom collection

Create a collection where vectors have embedding dimension of 2 and two metadata columns, int_col of type Int and str_col of type String. Currently, we use L2 as the vector distance metric. Support for other distance metrics will be available soon.

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

# Create a collection with two metadata columns
col = (ndb.CollectionBuilder() 
      .set_name('custom_collection')
      .set_dimension(2)
      .add_metadata_column('int_col', ndb. ColumnType.Int)
      .add_metadata_column('str_col', ndb.ColumnType.String).build())

ndb.create_collection(col)

Upserting vectors into the collection

You can now upsert vectors into the collection. Metadata columns have default value of null. You can overwrite this default by setting metadata field of the vector.

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

# Upsert 6 vectors. Some with one metadata column, others with two
col = ndb.collection('custom_collection')
col.upsert_vector([
    ndb.Vector(vid='1', embedding=[0.1, 0.1], metadata={'int_col': 1, 'str_col': 'vector at 0.1, 0.1'}),
    ndb.Vector(vid='2', embedding=[0.2, 0.2], metadata={'int_col': 2, 'str_col': 'vector at 0.2, 0.2'}),
    ndb.Vector(vid='3', embedding=[0.3, 0.3], metadata={'int_col': 3, 'str_col': 'vector at 0.3, 0.3'}),
    ndb.Vector(vid='100', embedding=[0.4, 0.4], metadata={'int_col': 100}),
    ndb.Vector(vid='200', embedding=[0.5, 0.5], metadata={'int_col': 200}),
    ndb.Vector(vid='300', embedding=[0.6, 0.6], metadata={'int_col': 300}),
])

Searching a collection

You can search a collection with a set of vectors, as well as a set of optional metadata column filters. Metadata filter is SQL-compatible and supports a wide range of operators, including:

  • Comparison Operators: =, <, >, <=, >=, !=
  • Boolean Operators: and, or, not
  • Grouping Operators: ()
  • Null Check: is null, is not null
import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

col = ndb.collection('custom_collection')

# Query 
col.search(
    [
        # Closest vectors are 1, 2, 3
        ndb.SearchRequest(embedding=[0.1, 0.1]),

        # Closest vectors are 2, 3, 100 (1 is filered out)
        ndb.SearchRequest(embedding=[0.1, 0.1], metadata_filter='int_col != 1'),

        # Closest vectors is 1 (2, 3, etc are filtered out)
        ndb.SearchRequest(embedding=[0.1, 0.1], metadata_filter='int_col = 1'),

        # Closest vectors are 100, 200, 300
        ndb.searchrequest(
            embedding=[0.1, 0.1], metadata_filter='str_col is null'),
    ])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nautilusdb_client-0.5.1.tar.gz (16.4 kB view hashes)

Uploaded Source

Built Distribution

nautilusdb_client-0.5.1-py3-none-any.whl (20.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page