Skip to main content

Unity Catalog-native episodic, semantic, and working memory for AI agents on Databricks

Project description

lakehouse-memory

PyPI License: Apache 2.0 CI

Unity Catalog-native episodic, semantic, and working memory for AI agents on Databricks.

Status: Pre-release (0.1.0b3). Public from day one. The core library, LangChain adapters, and DAB starter (M3) are workspace-validated; the docs site (M4) is not yet shipped. See the spec for design intent.

The pitch

Memory is the missing Databricks layer. The standard workaround is a sidecar vector DB with its own governance, access control, and lineage — a system you can't ship. Memory belongs in Unity Catalog, where your data already lives.

lakehouse-memory gives AI agents on Databricks three first-class memory primitives — episodic, semantic, and working — backed by Unity Catalog tables and Databricks Vector Search.

Install

pip install --pre lakehouse-memory

The --pre flag is required while the package is in pre-release. Once 0.1.0 ships (alongside the M3 DAB starter and M4 docs), pip install lakehouse-memory will work without the flag.

Quickstart with the DAB starter (recommended)

Bootstrap the whole reference architecture — UC tables, Vector Search indexes, and a working chat agent — in your Databricks workspace:

databricks bundle init https://github.com/travis-burmaster/lakehouse-memory \
  --template-dir templates/lakehouse-memory-bundle
cd <project-name>
databricks bundle deploy
databricks bundle run setup_job

You'll be prompted for your catalog, schema, Vector Search endpoint, SQL warehouse HTTP path, and LLM serving endpoint. After setup_job finishes, open notebooks/02_chat_agent.ipynb and run all cells — a memory-backed agent in under 10 minutes.

Manual setup (advanced)

from lakehouse_memory import Memory, MemoryConfig, Scope
from lakehouse_memory.client import SqlConnectorClient
from lakehouse_memory.vector_databricks import DatabricksVectorIndex
import os

config = MemoryConfig(catalog="main", schema_name="agent_memory")

client = SqlConnectorClient(
    server_hostname=os.environ["DATABRICKS_HOST"].replace("https://", ""),
    http_path=os.environ["DATABRICKS_HTTP_PATH"],
    access_token=os.environ["DATABRICKS_TOKEN"],
)

index = DatabricksVectorIndex(
    endpoint_name=os.environ["DATABRICKS_VECTOR_SEARCH_ENDPOINT"],
    index_name=f"{config.catalog}.{config.schema_name}.episodic_idx",
    workspace_url=os.environ["DATABRICKS_HOST"],
    access_token=os.environ["DATABRICKS_TOKEN"],
    columns=["event_id", "text", "user_id", "session_id", "agent_id"],
)

mem = Memory(config=config, client=client, index=index, scope=Scope(user_id="u_1"))
mem.provision(
    vector_search_endpoint=os.environ["DATABRICKS_VECTOR_SEARCH_ENDPOINT"],
    workspace_url=os.environ["DATABRICKS_HOST"],
    access_token=os.environ["DATABRICKS_TOKEN"],
)

# Write a fact
mem.semantic.upsert(fact="User prefers SQL over Python.")

# Delta Sync indexes are TRIGGERED — explicitly fire the sync after writes.
# (For production, consider switching to CONTINUOUS pipelines.)
mem.semantic._index.trigger_sync()

# Wait for sync; production code would use exponential backoff
import time; time.sleep(15)

facts = mem.semantic.retrieve("language preferences", k=3)

LangChain integration:

chat = mem.as_langchain_chat_history(limit=50)
retriever = mem.as_langchain_retriever(k=5)

Production gaps

(Coming in M4. Short version: compaction at scale, multi-tenant RLS, regression evals, observability, and custom retrieval strategies are deliberately not in OSS. If you want help building past those, the Burmaster Databricks AI Practice does this for a living.)

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lakehouse_memory-0.1.0b3.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lakehouse_memory-0.1.0b3-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file lakehouse_memory-0.1.0b3.tar.gz.

File metadata

  • Download URL: lakehouse_memory-0.1.0b3.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lakehouse_memory-0.1.0b3.tar.gz
Algorithm Hash digest
SHA256 5b368d120dd9bd43de15e59a3fe3ff8d54629c03542dfded446d03a533cdbcb3
MD5 5115dbef7ae280ae499f345324d02580
BLAKE2b-256 7dbd8edde3f7ad609cc159e61e76d99f8a0e80f6d4c15fc2d922501d58fddcee

See more details on using hashes here.

Provenance

The following attestation bundles were made for lakehouse_memory-0.1.0b3.tar.gz:

Publisher: publish.yml on travis-burmaster/lakehouse-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lakehouse_memory-0.1.0b3-py3-none-any.whl.

File metadata

File hashes

Hashes for lakehouse_memory-0.1.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 5b75c0114dcd78d05bd1597700a2f1d08fe4057c9edc203ff72679dc8e1672e5
MD5 000d9ab20c9148d1c8af8461129f376b
BLAKE2b-256 2ba77d08933e06e8a3970b5548d91ff530184ecbd834b2dbb05cd8b8f3723dbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for lakehouse_memory-0.1.0b3-py3-none-any.whl:

Publisher: publish.yml on travis-burmaster/lakehouse-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page