Skip to main content

Package for building vector indexes

Project description

Easy Faiss

Easy Faiss is a package for creating vector indexes / knowledge base for LLM chatbots. This provides an easy-to-use alternative over using vector databases such as Pinecone or Weaviate.

Example script:

import easyfaiss as ef

model, tokenizer = ef.bert_setup()

data = [
    "Hello, how are you?",
    "What's the weather today?",
    "How does photosynthesis work?",
    "What's the capital of France?",
    "Tell me about the history of the Roman Empire.",
    "How can I improve my coding skills?",
    "What's the recipe for a classic margarita?",
    "Explain the theory of relativity.",
    "What's the population of Tokyo?",
    "How can I learn a new language quickly"
]

# create embeddings to store in index
embeddings = ef.create_embeddings(model=model, tokenizer=tokenizer, dataset=data)

# create a query vector to perform similarity search on the index
user_query = "What's the population in Tokyo, Japan?"
query_vector = ef.create_embeddings(model=model, tokenizer=tokenizer, dataset=[user_query])

# create the index
flat_index = ef.FlatIndex(name='demo')
flat_index.create_index(dimensions=768)
flat_index.update_index(embeddings=embeddings, dataset=data)

print(flat_index.details())

# perform similarity search
indices, distances = flat_index.similarity_search(query_vector=query_vector, k=3)

EasyFaiss currently provides 2 types of faiss indexes

FlatIndex:

  • Speed: This index is relatively slower for nearest neighbor search, especially as the number of vectors increases.
  • Scale: Suitable for smaller datasets with low to moderate memory requirements.
  • Accuracy: Provides exact nearest neighbor search, making it the most accurate option.

HNSWFlatIndex:

  • Speed: Hierarchical Navigable Small World (HNSW) allows for fast approximate nearest neighbor search, especially on large datasets.
  • Scale: Well-suited for large datasets, as it uses approximate nearest neighbor search, which makes it more memory-efficient.
  • Accuracy: Provides approximate results that might have a small trade-off in accuracy compared to FlatIndex.

The choice between these two index types depends on your specific use case:

  • Use FlatIndex when you have a relatively small dataset and need precise, exact nearest neighbor search.
  • Use HNSWFlatIndex when you have a larger dataset, and you can tolerate a small reduction in accuracy in exchange for faster search and lower memory requirements.

For FlatIndex, an upper limit for the number of vectors depends on the available memory, but it's typically suitable for a few thousand vectors (e.g., up to 10,000 or more, depending on the dimension of the vectors and available RAM). Beyond that, you might start experiencing memory limitations. HNSWFlatIndex is a good choice for larger datasets, including millions of vectors, provided you can accept the trade-offs in accuracy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easyfaiss-0.1.5.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

easyfaiss-0.1.5-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file easyfaiss-0.1.5.tar.gz.

File metadata

  • Download URL: easyfaiss-0.1.5.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/6.2.0-35-generic

File hashes

Hashes for easyfaiss-0.1.5.tar.gz
Algorithm Hash digest
SHA256 afe1578310091db22fd52a98fde8c61e51eb01e93d639bf303187e75b782f806
MD5 d1240a8fcf71bbce73060d26911ea24b
BLAKE2b-256 8cb24855c1fb7c41c2d5b6504d30d8b4e0bafb1d8ebe829d3b8c90ee686bce3c

See more details on using hashes here.

File details

Details for the file easyfaiss-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: easyfaiss-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/6.2.0-35-generic

File hashes

Hashes for easyfaiss-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f74fcb99d74ac597754198093a005bb70630fc1e97440a8697c2ac9ff3e2a4f0
MD5 ec886f820397fe75431dcc31f5d565d6
BLAKE2b-256 c6bb0f51010e0a56bfa2843e46c50caf284afa5faedd3a120e9d860f464ae3b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page