Skip to main content

HAEMA memory framework built on ChromaDB

Project description

HAEMA

English | 한국어

HAEMA is an agent memory framework built on ChromaDB.

It provides three memory modes through a single write API:

  • core memory: durable high-impact identity/policy/user facts (get_core)
  • latest memory: recency slice by timestamp (get_latest)
  • long-term memory: semantic retrieval (search)

You only write through add(contents), and HAEMA updates all layers automatically.

Key Changes (Current)

  • add(contents) runs a single N:M reconstruction pass per call.
  • Embedding is split into query/document interfaces:
    • embed_query(...)
    • embed_document(...)
  • no-related special path is removed; one reconstruction schema is used.
  • reconstruction schema:
    • memories: list[str]
    • coverage: "complete" | "incomplete"

Installation

pip install haema

Development:

pip install -e ".[dev]"

Quick Start

from haema import Memory

m = Memory(
    path="./haema_store",
    output_dimensionality=1536,
    embedding_client=...,   # your EmbeddingClient implementation
    llm_client=...,         # your LLMClient implementation
    merge_top_k=3,
    merge_distance_cutoff=0.25,
)

m.add([
    "The user prefers concise and actionable responses.",
    "The user is building HAEMA on top of ChromaDB.",
])

print(m.get_core())                    # str
print(m.get_latest(begin=1, count=5)) # list[str]
print(m.search("user preference", 3))  # list[str]

Real provider example:

  • examples/google_genai_example.py

Public API

Constructor

Memory(path, output_dimensionality, embedding_client, llm_client, merge_top_k=3, merge_distance_cutoff=0.25)

  • path: storage root directory
  • output_dimensionality: embedding output dimension
  • embedding_client: user embedding adapter
  • llm_client: user structured-output LLM adapter
  • merge_top_k: related candidate count per new content (default 3)
  • merge_distance_cutoff: related-memory distance threshold (default 0.25)

Methods

  • get_core() -> str
  • get_latest(begin: int, count: int) -> list[str]
  • search(content: str, n: int) -> list[str]
  • add(contents: str | list[str]) -> None

Client Interfaces

EmbeddingClient

  • embed_query(texts, output_dimensionality) -> np.ndarray
  • embed_document(texts, output_dimensionality) -> np.ndarray

Both must return:

  • 2D numpy.ndarray
  • dtype float32
  • shape (len(texts), output_dimensionality)

LLMClient

  • generate_structured(system_prompt, user_prompt, response_model) -> dict[str, Any]

Must return a dict parseable by the provided Pydantic model.

Reconstruction Schema

HAEMA uses structured reconstruction output for long-term memory updates:

class MemoryReconstructionResponse(BaseModel):
    memories: list[str]
    coverage: Literal["complete", "incomplete"]

If output is empty or coverage == "incomplete", HAEMA runs one refinement pass. If it still fails, HAEMA safely falls back to normalized contents.

Prompt Contracts (Layer Responsibility)

HAEMA uses three independent prompt stages with separate outputs:

  • pre-memory split:
    • input: one raw add string
    • output schema: PreMemorySplitResponse(contents)
    • responsibility: split factual units only (no core policy decision)
  • reconstruction:
    • input: related memories + new contents
    • output schema: MemoryReconstructionResponse(memories, coverage)
    • responsibility: generate long-term memories only
  • core update:
    • input: current core + reconstructed new memories
    • output schema: CoreUpdateResponse(should_update, core_markdown)
    • responsibility: conservative core update only

Prompt user blocks are boundary-labeled with tags such as:

  • <raw_input> ... </raw_input>
  • <related_memories> ... </related_memories>
  • <new_contents> ... </new_contents>
  • <current_core_markdown> ... </current_core_markdown>
  • <candidate_new_memories> ... </candidate_new_memories>

These tags are prompt-boundary markers for model clarity, not parser/runtime control logic.

Core Memory Policy

Core memory should keep only durable, high-impact, high-confidence information. By prompt policy, candidate items should pass:

  1. durability across sessions
  2. material impact on future agent behavior
  3. high confidence grounded in evidence

Core prompt policy also enforces:

  • strict section routing to one of SOUL/TOOLS/RULE/USER
  • exclusion of temporary/session-only/transient logs and noise
  • compact high-signal output with a soft target budget around 8 bullets total

Storage Layout

Given path="./haema_store":

  • long-term vector DB: ./haema_store/db
  • core markdown: ./haema_store/core.md
  • latest index DB: ./haema_store/latest.sqlite3

Long-term metadata fields:

  • timestamp (UTC ISO8601)
  • timestamp_ms (Unix epoch milliseconds)

How add() Works

  1. Normalize input strings.
    • if contents is a single str, HAEMA first expands it into multiple pre-memory items via structured LLM output
  2. Batch query-embed all contents.
  3. For each query, fetch top-k and keep matches with distance cutoff.
  4. Union related memories by id.
  5. Run one reconstruction call with:
    • related memory documents (may be empty)
    • all new contents
  6. Upsert reconstructed memories with document embeddings.
  7. Delete replaced related IDs only after upsert succeeds.
  8. Update core once per add() call.

Breaking Changes

Compared to older builds:

  1. EmbeddingClient.embed(...) is removed.
  2. NoRelatedMemoryResponse is removed.
  3. MemorySynthesisResponse(update: list[str]) is replaced by MemoryReconstructionResponse.
  4. merge_top_k default changed from 5 to 3.

Documentation

  • docs/index.md
  • docs/usage.md
  • docs/api.md
  • docs/architecture.md
  • docs/release.md

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haema-0.3.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haema-0.3.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file haema-0.3.0.tar.gz.

File metadata

  • Download URL: haema-0.3.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for haema-0.3.0.tar.gz
Algorithm Hash digest
SHA256 91a1eb6f4b40800103b5bb9a30b587da530178a3ef0874a1c0102c3d0af213ef
MD5 5f100b1647666113ac87148fd95d6d9e
BLAKE2b-256 6d0f255d2f73d52a06254078a548c694ffc5719f37e5ffe9a2d14231f902415a

See more details on using hashes here.

Provenance

The following attestation bundles were made for haema-0.3.0.tar.gz:

Publisher: publish-pypi.yml on smturtle2/haema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file haema-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: haema-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for haema-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c855770a8fbe4cda0835c2428baf80a980857da7faf4cfc2d0547b97eaa177b8
MD5 a99f2c2e60fd913fc9cbb1b628cf89d8
BLAKE2b-256 38088d1499aa0f6577a19687138a1e71bb0255130d6437858b07954b54f8a7c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for haema-0.3.0-py3-none-any.whl:

Publisher: publish-pypi.yml on smturtle2/haema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page