Skip to main content

No project description provided

Project description

FastVS

fastvs PyPI version License: MIT

FastVS (Fast Vector Search) is a Python library designed for exact vector search in dataframes or tables. It provides functionality to work with both PyArrow Tables and Pandas DataFrames, allowing users to perform nearest neighbor searches using various distance metrics. It is most optimized for PyArrow Tables, as it uses the Rust Arrow library under the hood for zero-copy and vectorized computation.

Installation

FastVS can be installed using pip:

pip install fastvs

Functions

FastVS offers the following main functions:

search_arrow

Searches a PyArrow Table for the k nearest neighbors of a query point.

Parameters

  • table (pyarrow.Table): The table to search.
  • column_name (str): This column should be a list or np array type column, where each element is a vector of floats.
  • query_point (list or np array): The query point.
  • k (int): The number of nearest neighbors to return.
  • metric (str): The metric to use for the search (e.g., "euclidean", "manhattan", "cosine_similarity", "inner_product").

Returns

  • Tuple[List[int], List[float]]: The indices and distances of the k nearest neighbors.

Usage

import pyarrow as pa
from fastvs import search_arrow

indices, distances = search_arrow(your_pyarrow_table, "your_column", [1.0, 2.0], 5, "cosine_similarity")

search_pandas

Searches a Pandas DataFrame for the k nearest neighbors of a query point. This function uses search_table under the hood. Note that this function is slower than search_arrow due to the copying of data from the DataFrame to the Arrow table format.

Parameters

  • df (pandas.DataFrame): The DataFrame to search.
  • column_name (str): The column name to search. This column should be a list or np array type column, where each element is a vector of floats.
  • query_point (list or np array): The query point.
  • k (int): The number of nearest neighbors to return.
  • metric (str): The metric to use for the search.

Returns

  • Tuple[List[int], List[float]]: The indices and distances of the k nearest neighbors.

Usage

import pandas as pd
from fastvs import search_pandas

df = pd.read_csv("your_dataset.csv")
indices, distances = search_pandas(df, "your_column", [1.0, 2.0], 5, "cosine_similarity")

apply_distance_arrow

Applies a distance function to a PyArrow table and returns an array of distances.

Parameters

  • table (pyarrow.Table): The table to search.
  • column_name (str): The column name to search. This column should be a list or np array type column, where each element is a vector of floats.
  • query_point (list or np array): The query point.
  • metric (str): The metric to use for the search.

Returns

  • pyarrow.Array: The distances in the order of the table.

Usage

import pyarrow as pa
from fastvs import apply_distance_arrow

table = pa.Table.from_pandas(your_dataframe)
distances = apply_distance_arrow(table, "your_column", [1.0, 2.0], "euclidean")

apply_distance_pandas

Applies a distance function to a Pandas DataFrame and returns a Series of distances. Uses apply_distance_arrow under the hood.

Parameters

  • df (pandas.DataFrame): The DataFrame to search.
  • column_name (str): The column name to search. This column should be a list or np array type column, where each element is a vector of floats.
  • query_point (list or np array): The query point.
  • metric (str): The metric to use for the search.

Returns

  • pandas.Series: The distances as a pandas Series.

Usage

import pandas as pd
from fastvs import apply_distance_pandas

df = pd.read_csv("your_dataset.csv")
distances = apply_distance_pandas(df, "your_column", [1.0, 2.0], "euclidean")

Supported Metrics

FastVS supports various distance metrics, including:

  • Euclidean ("euclidean")
  • Manhattan ("manhattan")
  • Inner Product ("inner_product")
  • Cosine Similarity ("cosine_similarity")

Euclidean Distance

The Euclidean distance between two points $P$ and $Q$ in $N$-dimensional space, with $P = (p_1, p_2, ..., p_N)$ and $Q = (q_1, q_2, ..., q_N)$, is defined as:

 d(P, Q) = \sqrt{\sum_{i=1}^{N} (p_i - q_i)^2}

Manhattan Distance

The Manhattan distance (also known as L1 norm) between two points $P$ and $Q$ in $N$-dimensional space is the sum of the absolute differences of their Cartesian coordinates:

 d(P, Q) = \sum_{i=1}^{N} |p_i - q_i|

Cosine Similarity

Cosine similarity measures the cosine of the angle between two vectors$P$and$Q$in an$N$-dimensional space. It is defined as:

 \text{Cosine Similarity}(P, Q) = \frac{P \cdot Q}{\|P\| \|Q\|}

where$P \cdot Q$is the dot product of vectors $P$ and $Q$, and $|P|$ and $|Q|$ are the magnitudes (Euclidean norms) of vectors $P$ and $Q$, respectively.

Inner Product

The inner product (or dot product) between two vectors$P$and$Q$in an$N$-dimensional space is defined as the sum of the products of their corresponding components:

 \text{Inner Product}(P, Q) = P \cdot Q = \sum_{i=1}^{N} p_i q_i

Contribution

Contributions to FastVS are welcome! Please submit your pull requests to the repository or open an issue for any bugs or feature requests.

To Dos:

  • Clean up rust code
  • Support f32

License

FastVS is released under the MIT License. See the LICENSE file in the repository for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvs-0.1.8.tar.gz (32.9 kB view hashes)

Uploaded Source

Built Distributions

fastvs-0.1.8-cp312-none-win_amd64.whl (808.5 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

fastvs-0.1.8-cp312-none-win32.whl (737.4 kB view hashes)

Uploaded CPython 3.12 Windows x86

fastvs-0.1.8-cp312-cp312-macosx_11_0_arm64.whl (904.5 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

fastvs-0.1.8-cp312-cp312-macosx_10_7_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.12 macOS 10.7+ x86-64

fastvs-0.1.8-cp311-none-win_amd64.whl (808.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

fastvs-0.1.8-cp311-none-win32.whl (737.5 kB view hashes)

Uploaded CPython 3.11 Windows x86

fastvs-0.1.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

fastvs-0.1.8-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (2.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

fastvs-0.1.8-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (2.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

fastvs-0.1.8-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.0 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

fastvs-0.1.8-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

fastvs-0.1.8-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl (2.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.12+ i686

fastvs-0.1.8-cp311-cp311-macosx_11_0_arm64.whl (905.3 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

fastvs-0.1.8-cp311-cp311-macosx_10_7_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

fastvs-0.1.8-cp310-none-win_amd64.whl (808.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

fastvs-0.1.8-cp310-none-win32.whl (737.5 kB view hashes)

Uploaded CPython 3.10 Windows x86

fastvs-0.1.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

fastvs-0.1.8-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (2.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

fastvs-0.1.8-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (2.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

fastvs-0.1.8-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

fastvs-0.1.8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

fastvs-0.1.8-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (2.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

fastvs-0.1.8-cp310-cp310-macosx_11_0_arm64.whl (905.3 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

fastvs-0.1.8-cp310-cp310-macosx_10_7_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

fastvs-0.1.8-cp39-none-win_amd64.whl (808.3 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

fastvs-0.1.8-cp39-none-win32.whl (737.8 kB view hashes)

Uploaded CPython 3.9 Windows x86

fastvs-0.1.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

fastvs-0.1.8-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (2.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

fastvs-0.1.8-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (2.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

fastvs-0.1.8-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARMv7l

fastvs-0.1.8-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

fastvs-0.1.8-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (2.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

fastvs-0.1.8-cp39-cp39-macosx_11_0_arm64.whl (905.1 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

fastvs-0.1.8-cp39-cp39-macosx_10_7_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

fastvs-0.1.8-cp38-none-win_amd64.whl (807.9 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

fastvs-0.1.8-cp38-none-win32.whl (737.7 kB view hashes)

Uploaded CPython 3.8 Windows x86

fastvs-0.1.8-cp37-none-win_amd64.whl (808.0 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

fastvs-0.1.8-cp37-none-win32.whl (737.6 kB view hashes)

Uploaded CPython 3.7 Windows x86

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page