A package for text similarity and embeddings

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Vector Nest 🪺

Installation

pip install vector_db

⚡ Project Details

The project is a database management system for handling vector embeddings and metadata. The main functionalities include creating a database, adding data (in the form of text, metadata, and embeddings), and performing queries based on cosine similarity. This project is ideal for use in AI applications where you need to search, filter, and organize large amounts of vector data.

Key Features:

Create a database: You can create a new database with either overwrite or append mode.
Create a collection: Define a collection to store documents (such as research papers) and their embeddings.
Add data: Add synthetic or real data to the collection, including associated metadata and vector embeddings.
Search and retrieve: Use cosine similarity to retrieve the most relevant documents to a query. Filters such as author or category can be applied.
Advanced queries: Support for setting a similarity threshold to filter out low-relevance results.

Example Usage

⚡ Creating and adding data to a collection:

import random

# Initialize the VectorNest
manager = VectorNest()

# Step 1: Create a database named 'research_database' with mode='overwrite' or 'append'
db_name = 'research_database'
manager.create_database(db_name, mode='overwrite')  # Use 'overwrite' to start fresh or 'append' to keep existing data

db_name = 'research_database'
manager.use_database(db_name)

# Step 2: Create a collection for storing research papers, with mode='overwrite' or 'append'
collection_name = 'research_papers'
manager.create_collection(collection_name, mode='overwrite')  # 'overwrite' replaces existing collection, 'append' keeps it if it exists

# Step 3: Generate synthetic data for research papers
authors = ["Alice Johnson", "Bob Smith", "Carol Lee", "David Wu", "Eve Brown"]
categories = ["AI", "Data Science", "Quantum Computing", "Cybersecurity", "Blockchain"]
publication_years = [2019, 2020, 2021, 2022, 2023]

def generate_fake_abstract(category):
    return f"This paper discusses advancements in {category}. It covers recent trends, methodologies, and potential future applications."

# Step 4: Add synthetic research papers to the collection
for i in range(50):  # Adding 50 synthetic papers
    title = f"Research Paper {i+1}"
    category = random.choice(categories)
    author = random.choice(authors)
    year = random.choice(publication_years)
    abstract = generate_fake_abstract(category)
    
    metadata = {
        "title": title,
        "author": author,
        "year": str(year),
        "category": category
    }
    manager.add_to_collection(collection_name, text=abstract, metadata=metadata)

⚡ Retrieving from collection:

Example 1: Retrieve top 5 research papers similar to a specific topic, filtering by category

query_text = "advancements in AI"
filters = {"category": "AI"}
top_n = 5
retrieved_texts = manager.retrieve_from_collection(collection_name, query_text, filters=filters, top_n=top_n)

print("\nTop 5 research papers similar to the query in the 'AI' category:")
for result in retrieved_texts:
    print(f"Text: {result['text']}\nMetadata: {result['metadata']}\nSimilarity: {result['similarity']}\n")

⚡ Close the database connection:

manager.close_connection()

⚡ Close the database connection:

Example 2. Identify the most similar research papers in the entire collection, regardless of category, with a high similarity threshold

db_name = 'research_database'
collection_name = 'research_papers'

manager.use_database(db_name)


query_text = "applications of blockchain in security"
filters = {'author': 'Carol Lee'}
top_n = 5
threshold = 0.01
retrieved_texts = manager.retrieve_from_collection(collection_name, query_text, filters=filters, top_n=top_n, threshold=threshold)

print("\nTop 5 research papers related to 'blockchain' with high similarity:")
for result in retrieved_texts:
    print(f"Text: {result['text']}\nMetadata: {result['metadata']}\nSimilarity: {result['similarity']}\n")

More Information

For a detailed explanation and walkthrough of this project, check out the blog post on my website:

Link to Blog Post

You can also watch the YouTube video on this project for further understanding:

YouTube Video Link

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.1

Nov 10, 2024

This version

0.1.0

Nov 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_nest-0.1.0.tar.gz (6.3 kB view details)

Uploaded Nov 10, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vector_nest-0.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Nov 10, 2024 Python 3

File details

Details for the file vector_nest-0.1.0.tar.gz.

File metadata

Download URL: vector_nest-0.1.0.tar.gz
Upload date: Nov 10, 2024
Size: 6.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for vector_nest-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`557ed6e6c239beaca82575664ba724196d91abdb175b2472a0e58882037592a1`
MD5	`ec18a1cf536fd49c80e7a8c4f59c5d50`
BLAKE2b-256	`a632b1614f6e7b4b6986541987ba4a38e9f59b150054ded5f9f7b27f1cd222f0`

See more details on using hashes here.

File details

Details for the file vector_nest-0.1.0-py3-none-any.whl.

File metadata

Download URL: vector_nest-0.1.0-py3-none-any.whl
Upload date: Nov 10, 2024
Size: 6.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for vector_nest-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`50d4ee821c2264c9780094fe9e59c78b32bbdd2240712ea5262714be3a22f816`
MD5	`4f66a8ddf3ab74954551366ee126b270`
BLAKE2b-256	`12d92338488ad817ba7b4e4705c199a4d3f1f8ece5c817f2413c2ef368542bd8`

See more details on using hashes here.

vector-nest 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vector Nest 🪺

Installation

⚡ Project Details

Key Features:

Example Usage

⚡ Creating and adding data to a collection:

⚡ Retrieving from collection:

⚡ Close the database connection:

⚡ Close the database connection:

More Information

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes