Skip to main content

No project description provided

Project description

Easy Faiss

Easy Faiss is a small package that allows you to create vector indexes that can be used as a knowledge base for LLM chatbots. This provides an easy-to-use alternative over using vector databases that can come up a high cost.

Example script:

from easyfaiss.vectorembeddings import bert_setup, create_embeddings
from easyfaiss.vectorspace import FlatIndex, HNSWFlatIndex
import time

# Set up BERT model and tokenizer
model, tokenizer = bert_setup()

# Sample data
data = [
    "Hello, how are you?",
    "What's the weather today?",
    "How does photosynthesis work?",
    "What's the capital of France?",
    "Tell me about the history of the Roman Empire.",
    "How can I improve my coding skills?",
    "What's the recipe for a classic margarita?",
    "Explain the theory of relativity.",
    "What's the population of Tokyo?",
    "How can I learn a new language quickly"
]

# Create embeddings for the sample data
embeddings = create_embeddings(model=model, tokenizer=tokenizer, dataset=data)

# Define a user query
user_query = "What's the population in Tokyo, Japan?"

# Create embeddings for the user query
query_vector = create_embeddings(model=model, tokenizer=tokenizer, dataset=[user_query])

# Create a FlatIndex and update it with embeddings
flat_index = FlatIndex(name='flat_index')
flat_index.create_index(dimensions=768)
flat_index.update_index(embeddings=embeddings, dataset=data)

# Print details of the index
print("FlatIndex Details:")
print(flat_index.details())

# Perform similarity search on FlatIndex
print("\nFlatIndex Similarity Search:")
start_time = time.time()
indices, distances = flat_index.similarity_search(query_vector=query_vector, k=3)
end_time = time.time()
print("Time taken for kNN search in FlatIndex: {:.4f} seconds".format(end_time - start_time))

# Create an HNSWFlatIndex and update it with embeddings
hnsw_index = HNSWFlatIndex(name='hnsw_index')
hnsw_index.create_index(dimensions=768)
hnsw_index.update_index(embeddings=embeddings, dataset=data)

# Print details of the HNSWFlatIndex
print("\nHNSWFlatIndex Details:")
print(hnsw_index.details())

# Perform similarity search on HNSWFlatIndex
print("\nHNSWFlatIndex Similarity Search (Approximate Nearest Neighbors - ANN):")
start_time = time.time()
indices, distances = hnsw_index.similarity_search(query_vector=query_vector, k=3)
end_time = time.time()
print("Time taken for ANN search in HNSWFlatIndex: {:.4f} seconds".format(end_time - start_time))

EasyFaiss currently provides 2 types of faiss indexes

faiss.IndexFlatL2:

  • Speed: This index is relatively slower for nearest neighbor search, especially as the number of vectors increases.
  • Scale: Suitable for smaller datasets with low to moderate memory requirements.
  • Accuracy: Provides exact nearest neighbor search, making it the most accurate option.

faiss.IndexHNSWFlat:

  • Speed: Hierarchical Navigable Small World (HNSW) allows for fast approximate nearest neighbor search, especially on large datasets.
  • Scale: Well-suited for large datasets, as it uses approximate nearest neighbor search, which makes it more memory-efficient.
  • Accuracy: Provides approximate results that might have a small trade-off in accuracy compared to faiss.IndexFlatL2.

The choice between these two index types depends on your specific use case:

  • Use faiss.IndexFlatL2 when you have a relatively small dataset and need precise, exact nearest neighbor search.
  • Use faiss.IndexHNSWFlat when you have a larger dataset, and you can tolerate a small reduction in accuracy in exchange for faster search and lower memory requirements.

For faiss.IndexFlatL2, an upper limit for the number of vectors depends on the available memory, but it's typically suitable for a few thousand vectors (e.g., up to 10,000 or more, depending on the dimension of the vectors and available RAM). Beyond that, you might start experiencing memory limitations. faiss.IndexHNSWFlat is a good choice for larger datasets, including millions of vectors, provided you can accept the trade-offs in accuracy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easyfaiss-0.1.1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

easyfaiss-0.1.1-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file easyfaiss-0.1.1.tar.gz.

File metadata

  • Download URL: easyfaiss-0.1.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/6.2.0-35-generic

File hashes

Hashes for easyfaiss-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b6436a7ab9a9c6ed941c61f076ac1d500382c8513f016aaf7f5b86d496a88712
MD5 c5b60e9c0dc9615da8093784d1a2b018
BLAKE2b-256 0a2a658a4473fd646baecca69bcf9550ec89b61205fdf1d630aa0ae9c25ae941

See more details on using hashes here.

File details

Details for the file easyfaiss-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: easyfaiss-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/6.2.0-35-generic

File hashes

Hashes for easyfaiss-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8ed6c5fec67c73e6caae5eaec55f48673b42dc34c0174fd15c267bffa8bf89da
MD5 c4962319cf8f85208a296b6cefd7688d
BLAKE2b-256 471c3ca53c35a044779b532548f4b1bceed1ae5b9ee47588aa340bb5ffc80f48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page