AlayaLite Python extension module
Project description
AlayaLite Python Interfaces
This page introduces the Python interfaces in AlayaLite, which include two layers:
Collection: Manages raw data and its relationship with indexes. Users can build their knowledge base by integrating our Collection into other tools like Open WebUI and LangChain, or building their own RAG application following our tutorial.
Index: Handles search, insert, and delete operations for indexes. Researchers and developers can modify it to enhance algorithms or mechanisms. We support ANN-Benchmark evaluation in here.
In particular, collections and indexes are managed by Client. Next, we will introduce Client, Collection and Index in detail.
Client
The Client provides a convenient interface for creating, retrieving, saving, and deleting collections and indices from disk.
Initialize a client
You can initialize a client in following ways:
# Initialize the client without a URL, you can process data in memory
client = Client()
# Initialize the client with a URL to load data from disk
client = Client(url='/path/to/your/directory')
list_collections(): Returns a list of all collection names currently managed by the client.list_indices(): Returns a list of all index names currently managed by the client.
Get a collection or index
Get a collection object or an index object
# Get a collection by name
collection = client.get_collection(name='your_collection_name')
# Get an index by name
index = client.get_index(name='your_index_name')
get_collection(name: str): Retrieves a collection by name.get_index(name: str): Retrieves an index by name.
Create a collection or index
Creating a collection or an index:
Create a new collection
new_collection = client.create_collection(name='new_collection_name')
# Create a new index
new_index = client.create_index(name='new_index_name', index_type='your_index_type', data_type='your_data_type')
create_collection(name: str, **kwargs): Creates a new collection with the given name and parameters. Raises a RuntimeError if a collection or index with the same name already exists.create_index(name: str, **kwargs): Creates a new index with the given name and parameters. Raises a RuntimeError if a collection or index with the same name already exists.
Get or create a collection or index
We also provide interfaces for "getting or creating" interface:
# Get or create a collection
collection = client.get_or_create_collection(name='your_collection_name')
# Get or create an index
index = client.get_or_create_index(name='your_index_name')
Delete a collection
In addition, you can deleting a collection or an index by following ways:
# Delete a collection
client.delete_collection(collection_name='your_collection_name', delete_on_disk=True)
# Delete an index
client.delete_index(index_name='your_index_name', delete_on_disk=True)
delete_collection(collection_name: str, delete_on_disk: bool): Deletes a collection by name you maintained in memory. Optionally, it can also delete the collection from disk. Raises a RuntimeError if the collection does not exist.delete_index(index_name: str, delete_on_disk: bool): Deletes an index by name you maintained in memory. Optionally, it can also delete the index from disk. Raises a RuntimeError if the index does not exist.
Reset the client
Or you may want to delete everything
client.reset()
- The
reset()method clears all collections and indices from the client, removing all loaded data from memory. Then you can use the name to create a new one.
Save the collection or index
When you want to save all collections and indices for persistence, you can use the following interface:
# Save a collection
client.save_collection(collection_name='your_collection_name')
# Save an index
client.save_index(index_name='your_index_name')
save_collection(collection_name: str): Saves a collection to disk. The collection schema will be stored in a JSON file. Raises a RuntimeError if the client is not initialized with a URL or the collection does not exist.save_index(index_name: str): Saves an index to disk. The index schema will be stored in a JSON file. Raises a RuntimeError if the client is not initialized with a URL or the index does not exist.
Collection
The Collection class is a fundamental component for managing a set of documents along with their associated metadata and embeddings.
Initialize a collection
You can initialize a Collection instance like this:
from alayalite import Collection
collection = Collection(name='your_collection_name')
The __init__ method creates an empty structure to hold the collection's data, including raw document and embeddings.
Insert to a collection
To add new documents to the collection, you can use the insert method:
items = [(1, 'document1', [1.0, 2.0, 3.0], {'category': 'A'}), (2, 'document2', [4.0, 5.0, 6.0], {'category': 'B'})]
collection.insert(items)
insert(items: List[tuple]): Inserts a list of tuples, where each tuple contains an id, a document, an embedding, and metadata.
Upsert to a collection
If you want to insert new items or update existing ones if ids already exist, the upsert method is available:
items = [(1, 'updated_document1', [1.1, 2.1, 3.1], {'category': 'A'}), (3, 'document3', [7.0, 8.0, 9.0], {'category': 'C'})]
collection.upsert(items)
upsert(items: List[tuple]): Checks if an item with a given id already exists. If it does, the existing item is updated. If not, a new item is inserted.
Query a batch of vectors to a collection
For retrieving data based on vector similarity, the batch_query method can be used:
vectors = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
limit = 5
ef_search = 100
num_threads = 1
results = collection.batch_query(vectors, limit, ef_search, num_threads)
batch_query(vectors: list[list[float | int]], limit: int, ef_search: int = 100, num_threads: int = 1): Queries the index using a batch of vectors.ef_searchandnum_threadwill be used forwarded as the parameters of search process.
Filter query in a collection
If you want to filter data based on metadata, you can use the filter_query method:
filter = {'category': ['A', 'B'], 'status': 1}
filtered_results = collection.filter_query(filter, 10)
filter_query(filter: dict, limit: Optional[int] = None): Filters the collection based on the given metadata conditions.
Delete by id
To remove documents from the collection by their id, you can use the delete_by_id method:
ids = [1, 2]
collection.delete_by_id(ids)
delete_by_id(ids: List[int]): Removes the documents with the specified ids.
Delete by filtering metadata
If you want to delete items based on metadata filters, the delete_by_filter method is available:
filter = {'category': 'A'}
collection.delete_by_filter(filter)
delete_by_filter(filter: dict): Deletes items from the collection that match the given metadata filter.
Save a collection
To save the collection to disk for future use, you can use the save method:
url = '/path/to/save/collection'
collection.save(url)
save(url: str): Saves the collection's data, including the DataFrame and mapping structures, to the specified directory. It also saves the index schema in a JSON file. The collection data is pickled and saved in a collection.pkl file.
Load a collection
To load a previously saved collection from disk, you can use the load class method:
url = '/path/where/collection/is/stored'
name = 'your_collection_name'
loaded_collection = Collection.load(url, name)
load(url: str, name: str): Loads a collection from the specified directory.
Index
The Index class offers a python interface for handling vectors at the lowest level.
Initialize an index
You can initialize a new Index instance in the following way:
from your_module import Index, IndexParams
# Initialize with default name and default parameters
index = Index()
# Initialize with a custom name and custom parameters
params = IndexParams()
index = Index(name='your_index_name', params=params)
- The
__init__method takes an optional name (default is "default") and params (an instance of IndexParams). The index is not fully initialized at this point; it's a late - initialization process. In particular, the dimension, data type, et al. is not known until it is fitted or the first vector is inserted. In addition, you can set the parameter to set the metric type or restrict the index type, data type, et al.
Build an index
To build the index with a set of vectors, use the fit method:
import numpy as np
vectors = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
ef_construction = 100
num_threads = 1
index.fit(vectors, ef_construction, num_threads)
fit(vectors: VectorLikeBatch, ef_construction: int = 100, num_threads: int = 1): Constructs the index using the provided 2D array of vectors. Theef_constructionparameter controls the accuracy of index construction, andnum_threadsspecifies the number of threads to use during construction. The index can only be fitted once.
Insert a vector into the index
To insert a new vector into the index, use the insert method:
vector = np.array([7.0, 8.0, 9.0])
ef = 100
inserted_id = index.insert(vector, ef)
insert(vectors: VectorLike, ef: int = 100): Inserts a 1D vector into the index. Theefparameter controls the retrieval accuracy. It returns the assigned ID of the inserted vector.
Remove a vector from the index
To remove a vector from the index by its ID, use the remove method:
id_to_remove = 0
index.remove(id_to_remove)
remove(id: int): Removes the vector with the specified ID from the index.
ANN Search of a single query
For single - query searches, use the search method:
query = np.array([1.0, 2.0, 3.0])
topk = 2
ef_search = 100
neighbors = index.search(query, topk, ef_search)
search(query: VectorLike, topk: int, ef_search: int = 100): Performs an ANN search for the given 1D query vector. It returns the top-k nearest neighbors.
ANN Search of a batch of queries
For batch searches, use the batch_search method:
queries = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
topk = 2
ef_search = 100
num_threads = 1
batch_neighbors = index.batch_search(queries, topk, ef_search, num_threads)
batch_search(queries: VectorLikeBatch, topk: int, ef_search: int = 100, num_threads: int = 1): Performs a batch search for multiple query vectors. It returns the top-k nearest neighbors for each query.
Get the vector data by ID
To get the vector data associated with a given ID, use the get_data_by_id method:
id = 0
vector_data = index.get_data_by_id(id)
get_data_by_id(id: int): Retrieves the vector data corresponding to the given ID.
Get the dimensionality of the vectors
To get the dimensionality of the vectors stored in the index, use the get_dim method:
dim = index.get_dim()
get_dim(): Returns the dimension of the indexed vectors.
Save an index
To save the index to disk, use the save method:
url = '/path/to/save/index'
index_metadata = index.save(url)
save(url: str): Saves the index to the specified directory. It returns a dictionary containing metadata about the saved index.
Load an index
To load an existing index from disk, use the load class method:
url = '/path/where/index/is/stored'
name = 'your_index_name'
loaded_index = Index.load(url, name)
load(url: str, name: str): Loads an index from the specified directory. It returns an instance of the Index class with the loaded index data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alayalite-0.1.0a1-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: alayalite-0.1.0a1-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cea1e73e847557e6febbd69cdaa7e24066dc73c2b5ba14d3695bf2f478f2103
|
|
| MD5 |
90f53bf116558a655cf66446496bb4ed
|
|
| BLAKE2b-256 |
66501b4abccbd37a993eb1ea94707e299907c3d557da71b56cb87b5b7ceec950
|