Skip to main content

Python client for NautilusDB

Project description

nautilusdb-python-client

Python client for NautilusDB, a fully-managed, cloud-native vector search service.

NautilusDB is currently in public alpha. We're actively improving the product and releasing new features, and we'd love to hear your feedback! Please take a moment to fill out this feedback form to help us understand your use-case better.

By default, all collections are subject to permanent deletion after 2 weeks. Please let us know if you need to keep it for longer via the feedback form.

NautilusDB python client supports both high-level APIs where you can directly upload files and ask questions, as well as a set of low-level APIs to use it as a vector database to directly manipulate vectors.

Continue reading, or Click here to see high-level API guide.
Click here to see vector database API guide

Quickstart

You can try out NautilusDB in just a few lines of code. We have prepared a special public collection openai-web that can answer questions about the contents of www.openai.com

import nautilusdb as ndb

answer, _ = ndb.collection('openai-web').ask('what is red team?')
print(answer)
"""
Sample answer:

Red team refers to the group of external experts who work with OpenAI to
identify and evaluate potential risks and harmful capabilities in new systems.
The red team's role is to help develop taxonomies of risk and provide input
throughout the model and product development lifecycle.
"""

You can also create your own collections, upload files, then get answers specific to your data assets. The following example walks you through the process of creating a collection and indexing the original transformer paper into that collection.

import nautilusdb as ndb

# Create an API key
my_api_key = ndb.create_api_key()

# Configure ndb to use the newly minted API key
ndb.init(api_key=my_api_key)

# Create a new collection with preconfigured dimension
llm_research = ndb.CollectionBuilder.question_answer(name="llm_research").build()
ndb.create_collection(llm_research)

# Index the original Transformer paper into this collection.
llm_research.upload_document("https://arxiv.org/pdf/1706.03762.pdf")

# Get answers from this paper
llm_research.ask("what is a transformer?")

Installation

Install a released NautilusDB python client from pip.

python3 version >= 3.10 is required

pip3 install nautilusdb-client

Creating an API key

You need an API key to create, update, delete own collections. A collection can only be accessed by the API key that created it.

Account management and related functionalities will be released soon.

import nautilusdb as ndb

# Create a new API key
my_api_key = ndb.create_api_key()

# Please record this API key and keep it a secrete
#
# Collections created with this key can only be accessed
# through this key!
print(my_api_key)

# Use this API key in all subsequent calls
ndb.init(api_key=my_api_key)

Creating a Collection

See this page for a brief overview of NautilusDB data model

You can create a collection that is only accessible with a specific API key.

import nautilusdb as ndb

ndb.init(api_key="<my_api_key>")

# Create a collection called c1. c1 is configured to be compatible with 
# Q/A APIs. It has vector embeddings dimension of 1536, contains three metadata
# columns: text (string), tokens (int), filename (string). 
collection = ndb.CollectionBuilder.question_answer('llm_research').build()
ndb.create_collection(collection)

Listing collection

You can see list of collections you have access to. For example, this list will include all collections that were created using the currently configured API key.

import nautilusdb as ndb

ndb.init(api_key="<my_api_key>")

collections = ndb.list_collections()

Uploading a document

You can upload a local file or a file from a web URL and index it into a collection.

Supported file format

  • .pdf PDF files
  • .txt Plain-text files
  • .md Markdown files
  • .docx Microsoft word documents
import nautilusdb as ndb

ndb.init(api_key="<my_api_key>")

# llm_research collection was created in the previous step
collection = ndb.collection('llm_research')

# Local file and URLs are both supported.
# URL must contain the full scheme prefix (http:// or https://)
collection.upload_document('/path/to/file.pdf')
collection.upload_document('https://path/to/file.pdf')

Asking a question

You can ask questions within a collection. API key is required for private collections only. ask() method returns a plain-text answer to your question, as well as a list of most relevance references used to derive the answer.

Available public collections that do not require an API key to access

  • openai-web: Contains contents of www.openai.com
import nautilusdb as ndb

# Get a plain text answer, as well as a list of references from the collection
# that are the most relevant to the question.
answer, refs = ndb.collection('openai-web').ask('what is red team?')

ndb.init(api_key="<my_api_key>")
answer, refs = ndb.collection('llm_research').ask('what is a transformer?')

Deleting a collection

You can delete a collection using the same API key that was used to create it.

import nautilusdb as ndb

ndb.init(api_key="<my_api_key>")

ndb.delete_collection('llm_research')

Using NautilusDB as a vector database

NautilusDB is a vector database at its core. You can directly manipulate vectors in the database.

Creating a custom collection

Create a collection where vectors have embedding dimension of 2 and two metadata columns, int_col of type Int and str_col of type String. Currently, we use L2 as the vector distance metric. Support for other distance metrics will be available soon.

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

# Create a collection with two metadata columns
col = (ndb.CollectionBuilder() 
      .set_name('custom_collection')
      .set_dimension(2)
      .add_metadata_column('int_col', ndb. ColumnType.Int)
      .add_metadata_column('str_col', ndb.ColumnType.String).build())

ndb.create_collection(col)

Upserting vectors into the collection

You can now upsert vectors into the collection. Metadata columns have default value of null. You can overwrite this default by setting metadata field of the vector.

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

# Upsert 6 vectors. Some with one metadata column, others with two
col = ndb.collection('custom_collection')
col.upsert_vector([
    ndb.Vector(vid='1', embedding=[0.1, 0.1], metadata={'int_col': 1, 'str_col': 'vector at 0.1, 0.1'}),
    ndb.Vector(vid='2', embedding=[0.2, 0.2], metadata={'int_col': 2, 'str_col': 'vector at 0.2, 0.2'}),
    ndb.Vector(vid='3', embedding=[0.3, 0.3], metadata={'int_col': 3, 'str_col': 'vector at 0.3, 0.3'}),
    ndb.Vector(vid='100', embedding=[0.4, 0.4], metadata={'int_col': 100}),
    ndb.Vector(vid='200', embedding=[0.5, 0.5], metadata={'int_col': 200}),
    ndb.Vector(vid='300', embedding=[0.6, 0.6], metadata={'int_col': 300}),
])

Describing a collection with stats

You can retrieve collection configurations as well as simple statistics about the collection via describe API.

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

# Retrieve collection config and stats
col = ndb.describe_collection('custom_collection')
print(f"Collection {col.name} has {col.stats.vector_count} vectors!")

Deleting vectors from a collection

You can delete vectors from the collection. We support three deletion conditions

  • Delete by ID
    • You can delete a set of vectors by their IDs
  • Delete by metadata filter
    • You can delete all vectors that satisfy a metadata filter
  • Delete all
    • You can delete all vectors from a collection

Exactly one condition can be specified in each API call.

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

col = ndb.collection('custom_collection')

# Delete 3 vectors, 'foo', 'bar', and 'baz'.
col.delete_vectors(vector_ids=['foo', 'bar', 'baz'])

# Delete all vectors where 'int_col' is specified, and the value is less than 3.
col.delete_vectors(metadata_filter="int_col < 3")

# Delete all vectors in the collection
col.delete_vectors(delete_all=True)

Searching a collection

You can search a collection with a set of vectors, as well as a set of optional metadata column filters. Metadata filter is SQL-compatible and supports a wide range of operators, including:

  • Comparison Operators: =, <, >, <=, >=, !=
  • Boolean Operators: and, or, not
  • Grouping Operators: ()
  • Null Check: is null, is not null
import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

col = ndb.collection('custom_collection')

# Search
col.search(
    [
        # Closest vectors are 1, 2, 3
        ndb.SearchRequest(embedding=[0.1, 0.1]),

        # Closest vectors are 2, 3, 100 (1 is filered out)
        ndb.SearchRequest(embedding=[0.1, 0.1], metadata_filter='int_col != 1'),

        # Closest vectors is 1 (2, 3, etc are filtered out)
        ndb.SearchRequest(embedding=[0.1, 0.1], metadata_filter='int_col = 1'),

        # Closest vectors are 100, 200, 300
        ndb.SearchRequest(
            embedding=[0.1, 0.1], metadata_filter='str_col is null'),
    ])

Querying a collection

You can also perform a pure metadata query against vectors in a collection using metadata filter (see search documentation above for filter syntax).

Currently, returned vectors are not ranked. We're working on supporting text search with relevance scoring soon, stay tuned!

import nautilusdb as ndb

ndb.init(api_key='<my_api_key>')

col = ndb.collection('custom_collection')

# Metadata query
col.query(
    [
        ndb.QueryRequest(metadata_filter='int_col != 1'),
        ndb.QueryRequest(metadata_filter='int_col = 1'),
        ndb.QueryRequest(metadata_filter='str_col is null'),
    ])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nautilusdb_client-0.7.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

nautilusdb_client-0.7.0-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file nautilusdb_client-0.7.0.tar.gz.

File metadata

  • Download URL: nautilusdb_client-0.7.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for nautilusdb_client-0.7.0.tar.gz
Algorithm Hash digest
SHA256 f0f9566657571229f2fb18cd47f11283f106048d0ce36cf012222b5900c7f78c
MD5 3fc65a40cbd67a919c2b082a2b3f067a
BLAKE2b-256 01fae5f481135b08d2ea2408bf2ceed50c351afbab5074baf7037a7e16d6257f

See more details on using hashes here.

Provenance

File details

Details for the file nautilusdb_client-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nautilusdb_client-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 51dd1cd8e279b7f8870070ff45aa90277e6eaf31823999a4ebab3d5d00d6f7a2
MD5 02ebc3e52788b590028bd818adee27ed
BLAKE2b-256 e934f8829d078504495f0f42076ba1c831bcff46d3486275fbccf82228c59fac

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page