Skip to main content

A text memory meant to be used with conversational language models.

Project description

GoodAI-LTM

Long-term memory (LTM) is increasingly recognized as an essential component in applications powered by large language models

(LLMs).

Among the benefits of LTM is the possibility of continual learning. This is the ability to accumulate knowledge over time,

possibly over the agent's entire lifetime. As their knowledge builds incrementally, agents can leverage learned skills

to acquire increasingly complex abilities. Continual learning helps make agents robust against distributional drift, more capable

of continuous adaptation, and more human-like.

GoodAI-LTM brings together all the components necessary for equipping agents with text-based long term memory.

This includes text embedding models, reranking, vector databases, chunking, metadata such as time stamps and

document information, memory and query rewriting (expansion and disambiguation), storage and retrieval.

The package is especially adapted to provide a dialog-centric memory stream for social agents.

  • Embedding models: Use OpenAI, Hugging Face Sentence Transformers, or our own locally trainable embeddings.

The trainable embeddings allow multiple embeddings for a query or passage, which can capture different aspects of the text for more accurate retrieval.

  • Query-passage match ranking: In addition to similarity-based retrieval, we support models for estimating

query-passage matching after retrieval.

  • Vector databases: We currently provide a light-weight local vector database as well as support for FAISS.

The present emphasis on dialog is also a limitation: The memory is not currently optimized for other uses, such as

retrieving source code. See the Future plans section for features that are on our todo list.

Installation

pip install goodai-ltm

Short example

The following code snippet creates an instance of LTM, loads in some text and then retrieves the most relevant text chunks given a query:

from goodai.ltm.mem.auto import AutoTextMemory

mem = AutoTextMemory.create()

mem.add_text("Lorem ipsum dolor sit amet, consectetur adipiscing elit\n")

mem.add_text("Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore\n",

             metadata={'timestamp': time.time(), 'type': 'generic'})

r_memories = mem.retrieve(query='dolorem eum fugiat quo voluptas nulla pariatur?', k=3)

Loading a text memory instance

A default memory instance can be created as follows:

from goodai.ltm.mem.auto import AutoTextMemory



mem = AutoTextMemory.create()

You can also configure the memory by passing parameters to the create method.

In the following example, the memory uses a "gpt2" tokenizer

for chunking, a T5 model for embeddings, a

FAISS index for embedding storage instead of a simple vector

database, and a custom chunking configuration.

import torch

from transformers import AutoTokenizer

from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel

from goodai.ltm.mem.auto import AutoTextMemory

from goodai.ltm.mem.config import TextMemoryConfig

from goodai.ltm.mem.mem_foundation import VectorDbType



embedding_model = AutoTextEmbeddingModel.from_pretrained('st:sentence-transformers/sentence-t5-base')

tokenizer = AutoTokenizer.from_pretrained('gpt2')

config = TextMemoryConfig()

config.chunk_capacity = 30  # tokens

config.queue_capacity = 10000  # chunks

mem = AutoTextMemory.create(emb_model=embedding_model,

                            matching_model=None, 

                            tokenizer=tokenizer,

                            vector_db_type=VectorDbType.FAISS_FLAT_L2, 

                            config=config,

                            device=torch.device('cuda:0'))

Text memory usage

Call the add_text method to add text to the memory.

Text may consist of phrases, sentences or documents.

mem.add_text("Lorem ipsum dolor sit amet, consectetur adipiscing elit\n")

Internally, the memory will chunk and index the text

automatically.

Text can be associated with an arbitrary metadata dictionary, such as:

mem.add_text("Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore\n",

             metadata={'timestamp': time.time(), 'type': 'generic'})

Internally, the memory concatenates text stored using add_text with any text previously sent to the memory.

To retrieve a list of passages associated with a query,

call the retrieve method:

r_memories = mem.retrieve(query='dolorem eum fugiat quo voluptas nulla pariatur?', k=3)

The retrieve method returns a list of objects of type RetrievedMemory, containing

the following properties:

  • passage: The text of the memory. This corresponds to text found in a matching chunk, but it may be expanded using text from adjacent chunks.

  • distance: Calculated distance between the query and the chunk passage.

  • confidence: If a query-passage matching model is available, this is the probability assigned by the model.

  • metadata: Metadata associated with the retrieved text, if any.

Detailed example

For a slightly more detailed view of how the memory works, let us revisit the storage and retrieval of text passages.

text_example = """\

Jake Morales: Hey Archie, what do you think about teaming up with me and Isaac Coax? 

We could come up with a plan that would distract Lucas Fern.

Archie: That would be great. Thanks for helping me out."""

To store this text, we create an instance of the default memory class and add the text to it.

mem = AutoTextMemory.create()

mem.add_text(text_example)

The text is encoded by the tokenizer as token ids.

[39568, 17989, 35, 11468, 25261, 6, 99, 109, 47, 206, 59, 165, 154, 62, 19, 162, 8, 

 12370, 944, 3631, 116, 166, 115, 283, 62, 19, 10, 563, 14, 74, 21943, 7895, 21572, 4, 50118, 8138, 

 17309, 35, 280, 74, 28, 372, 4, 4557, 13, 1903, 162, 66, 4]

The tokenized text is split into overlapping chunks that are recorded in a chunk queue.

| Id | Metadata | Content |

| ----- | -------- | ------- |

| 0 | {'foo': 'bar'} | " Jake Morales: Hey Archie, what do you think about teaming up with me and Isaac Coax?" |

| 1 | {'foo': 'bar'} | " think about teaming up with me and Isaac Coax? \n We could come up with a plan" |

| 2 | {'foo': 'bar'} | " \n We could come up with a plan that would distract Lucas Fern.\n Archie:" |

| 3 | {'foo': 'bar'} | " that would distract Lucas Fern.\n Archie: That would be great. Thanks for helping me out." |

| 4 | {'foo': 'bar'} | " That would be great. Thanks for helping me out." |

The embedding model converts each chunk into a high-dimensional vector, e.g., a unit vector of dimension 768.

The embeddings, and the corresponding chunk indexes, are added to the vector database.

The passages are now represented in memory as pairs of vectors and chunk indexes in the vector database and as

sequences of tokens in the chunk queue. From the token sequences, the text can be recovered.

During retrieval, the stored embeddings closest to the query embedding are found and the corresponding texts

decoded.

In addition to the steps above, it is also possible to rewrite queries and memories and to perform passage

reranking after retrieval.

The diagrams below illustrate what happens during storage and retrieval (sans optional query and memory rewriting).

Storage

Loading an embedding model

An embedding model is loaded as follows:

from goodai.ltm.embeddings.auto import AutoTextEmbeddingModel



em = AutoTextEmbeddingModel.from_pretrained(model_name)

The model_name can be one of the following:

  • A SentenceTransformer (Huggingface), starting with "st:", for example, "st:sentence-transformers/multi-qa-mpnet-base-cos-v1".

  • An OpenAI embedding model name, starting with "openai:", for example, "openai:text-embedding-ada-002".

  • One of our fine-tuned models:

Name | Base model | # parameters | # storage emb

---- |--------------------------------------------------|--------------| -----

em-MiniLM-p3-01 | multi-qa-MiniLM-L6-cos-v1 | 22.7m | 3

em-distilroberta-p1-01 | sentence-transformers/all-distrilroberta-v1 | 82.1m | 1

em-distilroberta-p3-01 | sentence-transformers/all-distrilroberta-v1 | 82.1m | 3

Embedding model usage

To get embeddings for a list of queries, call

the encode_queries method, as follows:

r_emb = em.encode_queries(['hello?'])

This returns a numpy array. To get a Pytorch tensor,

add the convert_to_tensor parameter:

r_emb = em.encode_queries(['hello?'], convert_to_tensor=True)

To get embeddings for a list of passages, call

the encode_corpus method, as follows:

s_emb = em.encode_corpus(['it was...', 'the best of...'])

A peculiarity of our embedding model is that queries

and passages can have more than one embedding.

Embedding tensors have 3 axes: The batch size, the number of

embeddings, and the number embedding dimensions. Typically,

the number of embeddings per query/passage will be 1, except for the

passage embeddings in some of our fine-tuned models.

Loading a query-passage matching model

A query-passage matching/reranking model can be loaded as follows:

from goodai.ltm.reranking.auto import AutoTextMatchingModel



model = AutoTextMatchingModel.from_pretrained(model_name)

The model_name can be one of the following:

  • A "st:" prefix followed by the name of a Huggingface cross-encoder compatible with the SentenceTransformers library, like "st:cross-encoder/stsb-distilroberta-base"

  • One of our fine-tuned models:

| Name | Base model | # parameters |

| ---- | ---------- |--------------|

| qpm-distilroberta-01 | sentence-transformers/all-distilroberta-v1 | 82.1m |

Note that a limitation of our fine-tuned query-passage matching models is that they are optimized

for short passages (24 to 36 tokens) and the passages are truncated by the model during inference.

Memory instances, by default, do not use a query-passage matching model. To enable one, it should be configured

as follows:

qpmm = AutoTextMatchingModel.from_pretrained('qpm-distilroberta-01')

config = TextMemoryConfig()

config.reranking_k_factor = 3

mem = AutoTextMemory.create(matching_model=qpmm, config=config)

The reranking_k_factor setting tells the memory how many candidates it should consider

for reranking. The user requests k memories. The reranking algorithm considers

k * reranking_k_factor chunks.

Query-passage matching model usage

The predict method of the model takes a list of

query-passage tuples and returns a list of floats

representing match probabilities. Example:

model = AutoTextMatchingModel.from_pretrained('qpm-distilroberta-01')

sentences = [

    ('Mike: What is your favorite color?', 'Steve: My favorite color is purple.'),

    ('Name the inner planets.', 'It was the best of times, it was the worst of times.'),

]

prob = model.predict(sentences)

print(prob)

Use in GoodAI's AI game

An early application of GoodAI-LTM is in GoodAI's forthcoming AI Game.

LLMs are used to shape NPC behavior and to generate dialog. Long-term memory is used to provide characters

with backstories and allows them to accumulate experience during game play.

AI-game

Evaluation of embedding models

We're interested in retrieval of relatively short

passages (one to a few sentences) using conversational

queries that may be found in a chat. To this end we've developed

an evaluation based on datasets QReCC,

StrategyQA, and

MS MARCO.

Results in the following table show top-3 and top-10

retrieval accuracy for each dataset.

Model | qrecc @3 | qrecc @10 | strategyqa @3 | strategyqa @10 | msmarco @3 | msmarco @10

----- | ------- | -------- | ------------ | ------------- | --------- | ----------

openai:text-embedding-ada-002 | 67.09 | 76.80 | 68.00 | 82.40 | 73.10 | 80.14 |

st:sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | 69.98 | 77.57 | 73.90 | 87.75 | 70.31 | 77.71 |

st:sentence-transformers/all-distilroberta-v1 | 65.01 | 76.15 | 66.35 | 82.50 | 68.59 | 78.34 |

st:sentence-transformers/sentence-t5-large | 68.40 | 78.28 | 72.55 | 86.60 | 71.30 | 80.51 |

st:sentence-transformers/all-mpnet-base-v2 | 70.69 | 80.19 | 74.50 | 87.65 | 75.00 | 81.77 |

st:sentence-transformers/multi-qa-mpnet-base-cos-v1 | 74.95 | 82.42 | 79.75 | 91.25 | 75.00 | 82.85 |

em-MiniLM-p3-01 (ours) | 72.87 | 80.02 | 78.00 | 89.75 | 73.38 | 79.96 |

em-distilroberta-p1-01 (ours) | 77.67 | 83.84 | 83.25 | 94.15 | 79.78 | 84.39 |

em-distilroberta-p3-01 (ours) | 78.33 | 84.66 | 86.55 | 95.40 | 79.51 | 85.29 |

Model em-distilroberta-p1-01 is the default model used

by this library. While em-distilroberta-p3-01 has better

retrieval accuracy, note that it requires storing 3 embeddings

per chunk. The distilroberta models have 82 million parameters

and their embedding size is 768.

Model em-MiniLM-p3-01 is a good choice if you need

a lightweight model. It has only 22.7 million parameters.

It produces 3 storage embeddings per chunk, but these

are embeddings of size 384.

Evaluation of query-passage matching models

At this time, we have yet to find a reranking cross-encoder that can consistently boost the retrieval accuracy of our best

embedding models, but our default fine-tuned matching model is able to improve memory accuracy relative

to weak/low-resource embedding models. The following tests were performed on a memory configured

to use qpm-distilroberta-01 as the query-passage matching model, and two different embedding models

with different values assigned to the reranking_k_factor setting.

reranking_k_factor w/ emb model | qrecc @3 | qrecc @10 | strategyqa @3 | strategyqa @10 | msmarco @3 | msmarco @10

----- |----------|-----------|---------------|----------------|------------| ----------

x2 w/ st:sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | 75.11 | 80.19 | 86.00 | 92.25 | 75.09 | 79.96 |

x3 w/ st:sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | 76.04 | 81.39 | 86.30 | 93.50 | 74.91 | 80.96 |

x4 w/ st:sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | 76.04 | 81.77 | 86.50 | 94.05 | 75.63 | 81.50 |

x5 w/ st:sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | 76.31 | 82.31 | 86.80 | 94.50 | 75.81 | 81.32 |

x2 w/ st:sentence-transformers/multi-qa-mpnet-base-cos-v1 | 77.24 | 83.79 | 87.40 | 93.90 | 77.98 | 83.75 |

x3 w/ st:sentence-transformers/multi-qa-mpnet-base-cos-v1 | 76.97 | 83.90 | 87.75 | 94.95 | 76.71 | 83.03 |

x4 w/ st:sentence-transformers/multi-qa-mpnet-base-cos-v1 | 77.24 | 84.01 | 87.40 | 95.10 | 76.08 | 82.85 |

x5 w/ st:sentence-transformers/multi-qa-mpnet-base-cos-v1 | 76.97 | 83.84 | 87.65 | 95.50 | 76.35 | 82.85 |

These embedding models, with the boosting provided by qpm-distilroberta-01, nearly reach the memory

retrieval accuracy of our best embedding models. Reranking in this manner could be useful if you need

to have a compact model with small embeddings in production. Note, however, that the query-passage

matching model does add a performance overhead when executing queries.

The matching model can also be useful if you need the confidence scores of retrieved memories.

Future plans

We will continue to improve GoodAI-LTM. Possible next steps include

  • Retrieval weighted by recency and importance

  • Flag for preventing internal text concatenation in mem.add_text()

  • Embeddings for source code retrieval

  • Storage and retrieval methods without embeddings

  • Improvements to the currently experimental query and memory rewriting feature and its default prompts

  • Configurable chunk overlapping and expansion

  • Iterating on improvements to our datasets and models

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goodai-ltm-0.0.20.tar.gz (120.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goodai_ltm-0.0.20-py3-none-any.whl (137.6 kB view details)

Uploaded Python 3

File details

Details for the file goodai-ltm-0.0.20.tar.gz.

File metadata

  • Download URL: goodai-ltm-0.0.20.tar.gz
  • Upload date:
  • Size: 120.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for goodai-ltm-0.0.20.tar.gz
Algorithm Hash digest
SHA256 87053e4c3ea9d18a1d911971b7989a53c3a01e53b2585872346c13eccec32ca1
MD5 6dee97cdcb32ca907f4d36e22dc472b7
BLAKE2b-256 a37dfda2756e71bc76df1166029d9dae6033b682a75f89af592b5dc23947f606

See more details on using hashes here.

File details

Details for the file goodai_ltm-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: goodai_ltm-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 137.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for goodai_ltm-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 3c5db4293f4b055c8b87e75a5fc9cb2188d66186e9a18edd2d9fd8f099f1de68
MD5 ac1ce8d9d80c0ae6845c0b5e382c1173
BLAKE2b-256 d291c4b3c92dda1af8273f95222e8e31c1537e73ef857868af1121575add181b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page