Generated from aind-library-template

Project description

GAMER: Generative Analysis for Metadata Retrieval

Code Style Interrogate Coverage Python

Installation

Install a virtual environment with python 3.11 (install a version of python 3.11 that's compatible with your operating system).

py -3.11 -m venv .venv

On Windows, activate the environment with

.venv\Scripts\Activate.ps1

You will need access to the AWS Bedrock service in order to access the model. Once you've configured the AWS CLI, and granted access to Anthropic's Claude Sonnet 3 and 3.5, proceed to the following steps.

Install the chatbot package -- ensure virtual environment is running.

pip install metadata-chatbot

Usage

To call the model,

from metadata_chatbot.agents.GAMER import GAMER

query = "What was the refractive index of the chamber immersion medium used in this experiment SmartSPIM_675387_2023-05-23_23-05-56"
model = GAMER()
result = model.invoke(query)

print(result)

To call the model asynchronously, which reduces the model's call time by ~50%, run --

result = await model.ainvoke(query)
print(result)

High Level Overview

The project's main goal is to developing a chat bot that is able to ingest, analyze and query metadata. Metadata is accumulated in lieu with experiments and consists of information about the data description, subject, equipment and session. To maintain reproducibility standards, it is important for metadata to be documented well. GAMER is designed to streamline the querying process for neuroscientists and other users.

Model Overview

The current chat bot model uses Anthropic's Claude Sonnet 3 and 3.5, hosted on AWS' Bedrock service. Since the primary goal is to use natural language to query the database, the user will provide queries about the metadata specifically. The framework is hosted on Langchain. Claude's system prompt has been configured to understand the metadata schema format and craft MongoDB queries based on the prompt. Given a natural language query about the metadata, the model will produce a MongoDB query, thought reasoning and answer. This method of answering follows chain of thought reasoning, where a complex task is broken up into manageable chunks, allowing logical thinking through of a problem.

The main framework used by the model is Retrieval Augmented Generation (RAG), a process in which the model consults an external database to generate information for the user's query. This process doesn't interfere with the model's training process, but rather allows the model to successfully query unseen data with few shot learning (examples of queries and answers) and tools (e.g. API access) to examine these databases.

Multi-Agent graph framework

A multi-agent workflow is created using Langgraph, allowing for parallel execution of tasks, like document retrieval from the vector index, and increased developer control over the the RAG process. Decision nodes and their roles are further explained in the GAMER_workbook.

Worfklow

Data Retrieval

Vector Embeddings

To improve retrieval accuracy and decrease hallucinations, we use vector embeddings to access relevant chunks of information found across the database. This process starts with accessing assets, and chunking each json file to chunks of around 8000 tokens (10 chunks per file)-- each chunk preserves the hierarchy found in json files. These chunks are converted to vector arrays of size 1024, through an embedding model (Amazon's Titan 2.0 Embedding). The user's query is converted to a vector and projected onto the latent space. The chunks that contain the most relevant information will be accessed through a cosine similarity search.

AIND-data-schema-access REST API

For queries that require accessing the entire database, like count based questions, information is accessed through an aggregation pipeline, provided by one of the constructed LLM agents, and the API connection.

Current specifications

The model can query the fields for a specified asset.
The model can query metadata documents from the document database.
The model is able to return a list of unique values for a given field.
The model is able to answer count based questions.

Project details

Release history Release notifications | RSS feed

0.0.73

Nov 21, 2024

0.0.72

Nov 21, 2024

0.0.71

Nov 21, 2024

0.0.70

Nov 21, 2024

0.0.69

Nov 21, 2024

0.0.68

Nov 21, 2024

0.0.67

Nov 4, 2024

This version

0.0.66

Nov 4, 2024

0.0.65

Nov 4, 2024

0.0.64

Nov 4, 2024

0.0.63

Nov 4, 2024

0.0.62

Nov 4, 2024

0.0.61

Nov 4, 2024

0.0.60

Nov 4, 2024

0.0.59

Nov 4, 2024

0.0.58

Nov 4, 2024

0.0.57

Nov 1, 2024

0.0.56

Nov 1, 2024

0.0.55

Nov 1, 2024

0.0.54

Nov 1, 2024

0.0.53

Nov 1, 2024

0.0.52

Nov 1, 2024

0.0.51

Nov 1, 2024

0.0.50

Nov 1, 2024

0.0.49

Oct 30, 2024

0.0.48

Oct 30, 2024

0.0.47

Oct 30, 2024

0.0.46

Oct 30, 2024

0.0.45

Oct 30, 2024

0.0.44

Oct 30, 2024

0.0.43

Oct 30, 2024

0.0.42

Oct 30, 2024

0.0.41

Oct 29, 2024

0.0.40

Oct 29, 2024

0.0.39

Oct 28, 2024

0.0.38

Oct 28, 2024

0.0.37

Oct 28, 2024

0.0.36

Oct 28, 2024

0.0.35

Oct 28, 2024

0.0.34

Oct 25, 2024

0.0.33

Oct 25, 2024

0.0.32

Oct 25, 2024

0.0.31

Oct 24, 2024

0.0.30

Oct 24, 2024

0.0.29

Oct 24, 2024

0.0.28

Oct 22, 2024

0.0.27

Oct 21, 2024

0.0.26

Oct 17, 2024

0.0.25

Oct 17, 2024

0.0.24

Oct 16, 2024

0.0.23

Oct 16, 2024

0.0.22

Oct 16, 2024

0.0.19

Oct 16, 2024

0.0.18

Oct 16, 2024

0.0.17

Oct 16, 2024

0.0.16

Oct 16, 2024

0.0.15

Oct 16, 2024

0.0.14

Oct 15, 2024

0.0.13

Oct 15, 2024

0.0.12

Sep 17, 2024

0.0.11

Sep 13, 2024

0.0.10

Sep 13, 2024

0.0.9

Sep 13, 2024

0.0.8

Sep 12, 2024

0.0.7

Sep 12, 2024

0.0.6

Sep 12, 2024

0.0.5

Sep 3, 2024

0.0.4

Aug 30, 2024

0.0.3

Aug 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metadata_chatbot-0.0.66.tar.gz (93.0 kB view details)

Uploaded Nov 4, 2024 Source

Built Distribution

metadata_chatbot-0.0.66-py3-none-any.whl (37.9 kB view details)

Uploaded Nov 4, 2024 Python 3

File details

Details for the file metadata_chatbot-0.0.66.tar.gz.

File metadata

Download URL: metadata_chatbot-0.0.66.tar.gz
Upload date: Nov 4, 2024
Size: 93.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for metadata_chatbot-0.0.66.tar.gz
Algorithm	Hash digest
SHA256	`cc880b3ce6c454d1a5cc9489e21075a62a2e63b92670f091ceaa7e4790b912a6`
MD5	`3a8ef56f45e49939ec551db7e48b0801`
BLAKE2b-256	`3802511ce50691bb737b2d7559b638f71e15fc56410765647a1f75db4ec2ee57`

See more details on using hashes here.

File details

Details for the file metadata_chatbot-0.0.66-py3-none-any.whl.

File metadata

Download URL: metadata_chatbot-0.0.66-py3-none-any.whl
Upload date: Nov 4, 2024
Size: 37.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for metadata_chatbot-0.0.66-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fdc6d3c7910156d0f3ecd1735e2b9aaa4883379c70630247e6b8ed988c578221`
MD5	`87d7e03b84a85a99a37ea8a3dadc93da`
BLAKE2b-256	`3afb50d5f9d2905fcfb0321fcbda374ff5caa56f05edaf3412dcec93233ecbff`