M2M EBM Vector Database — Energy-Based Model storage with full CRUD, WAL persistence, REST API, and Self-Organized Criticality

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

schwabauerbriantomas-gif

These details have not been verified by PyPI

Project description

M2M EBM Vector Database

M2M — Energy-Based Model (EBM) Vector Database

A production-ready vector database powered by Gaussian Splats and hierarchical retrieval (HRM2), extended in v2.0 with a full Energy-Based Model layer: Write-Ahead Logging, complete CRUD, Self-Organized Criticality, and an energy-aware REST API.

🆕 What's New in v2.0

Feature	Description
Full CRUD	`add/update/delete` with ids, metadata, documents and metadata filters
Write-Ahead Log	Durable `msgpack`/JSON WAL + SQLite metadata persistence
EBM Energy API	`E(x)`, gradient, free energy, local 2D maps
Exploration API	High-uncertainty regions, Boltzmann sampling, agent suggestions
SOC Engine	Self-Organized Criticality: avalanche dynamics & system relaxation
REST API v2	Collections-based, full CRUD + EBM endpoints
Energy Router	5 routing strategies for distributed clusters

🎯 Overview

M2M is a vector database built on Gaussian Splats with hierarchical retrieval (HRM2). Version 2.0 adds a complete Energy-Based Model layer, turning it into a living, self-organizing database that understands the energy landscape of its data.

Core Engine Features

Feature	Description
Hierarchical Retrieval (HRM2)	Two-level clustering (coarse → fine) for sub-millisecond searches
Gaussian Splats	Full latent representation (μ, α, κ)
EBM Layer	Energy landscape, exploration, Self-Organized Criticality
Local-First	No cloud dependencies, pure Python/NumPy
GPU Acceleration	Optional Vulkan compute shader (cross-platform)

⚡ Quick Start

pip install m2m-vector-search

import numpy as np
from m2m import SimpleVectorDB

# Initialize (supports 'edge', 'standard', 'ebm' modes)
db = SimpleVectorDB(latent_dim=768, mode='standard')

# Add with metadata
db.add(
    ids=['doc1', 'doc2', 'doc3'],
    vectors=np.random.randn(3, 768).astype(np.float32),
    metadata=[{'category': 'tech'}, {'category': 'science'}, {'category': 'tech'}],
    documents=['Doc 1 text', 'Doc 2 text', 'Doc 3 text']
)

# Search with metadata filter
results = db.search(query, k=5, filter={'category': {'$eq': 'tech'}}, include_metadata=True)

# Update a document
db.update('doc1', metadata={'category': 'technology', 'reviewed': True})

# Soft-delete
db.delete(id='doc2')

# Hard-delete all docs matching a filter  
db.delete(filter={'category': {'$eq': 'science'}}, hard=True)

🌓 Two Modes of Operation

1. SimpleVectorDB

"The SQLite of Vector DBs"

Edge-optimized. Full CRUD. Optional EBM and persistence.

from m2m import SimpleVectorDB

# Edge mode (minimal overhead, no WAL)
db = SimpleVectorDB(latent_dim=768, mode='edge')

# Standard mode (WAL + SQLite persistence)
db = SimpleVectorDB(latent_dim=768, mode='standard', storage_path='./data')

# EBM mode (full energy landscape features)
db = SimpleVectorDB(latent_dim=768, mode='ebm', storage_path='./data')

db.add(ids=['doc1'], vectors=vectors, metadata=[{'cat': 'tech'}])
results = db.search(query, k=10, include_metadata=True)
db.update('doc1', metadata={'cat': 'technology'})
db.delete(id='doc1')

2. AdvancedVectorDB

"The Cognitive Latent Space"

Autonomous agents. Full EBM features. Self-Organized Criticality.

from m2m import AdvancedVectorDB

db = AdvancedVectorDB(latent_dim=768, enable_soc=True, enable_energy_features=True)
db.add(ids=['doc1'], vectors=vectors)

# SOC mechanics
report = db.check_criticality()
result = db.trigger_avalanche()
relax_result = db.relax(iterations=10)

# EBM search
sr = db.search_with_energy(query, k=10)
print(f"Query energy: {sr.query_energy:.4f}")

3. M2M Cluster

"The Distributed Vector Network"

Horizontal scalability with optional energy-aware routing.

from m2m import M2MConfig
from m2m.cluster import EdgeNode, ClusterRouter, M2MClusterClient

config = M2MConfig(device='cpu')
edge1 = EdgeNode(edge_id="edge-1", config=config)
edge2 = EdgeNode(edge_id="edge-2", config=config)

router = ClusterRouter(energy_router_config={
    'enabled': True,
    'strategy': 'hybrid'
})
client = M2MClusterClient(in_memory_router=router)
client.register_local_edge(edge1)
client.register_local_edge(edge2)

client.ingest(np.random.randn(1000, 768).astype(np.float32))
results = client.search(query, k=10)

⚡ EBM Features

Energy Landscape

from m2m import SimpleVectorDB

db = SimpleVectorDB(latent_dim=768, mode='ebm')
db.add(ids=ids, vectors=vectors)

# Get energy of a vector
energy = db.get_energy(query_vector)

# Search with energy information
sr = db.search_with_energy(query, k=10)
for r in sr.results:
    print(f"{r.id}: score={r.score:.4f}, energy={r.energy:.4f}, confidence={r.confidence:.4f}")

# Find knowledge gaps (high-uncertainty regions)
gaps = db.find_knowledge_gaps(n=5)

# Agent exploration suggestions
suggestions = db.suggest_exploration(n=3)

Self-Organized Criticality (SOC)

SOC keeps the database in a state of maximum information capacity — automatically identifying and resolving over-dense memory regions through Bak-Tang-Wiesenfeld avalanche dynamics.

from m2m import AdvancedVectorDB

db = AdvancedVectorDB(latent_dim=768, enable_soc=True)
db.add(ids=ids, vectors=vectors)

# Check system criticality
report = db.check_criticality()
# report.state: 'subcritical' | 'critical' | 'supercritical'
print(f"System state: {report.state}, index: {report.index:.4f}")

# Trigger avalanche to redistribute memory
avalanche = db.trigger_avalanche()
print(f"Affected clusters: {avalanche.affected_clusters}, energy released: {avalanche.energy_released:.4f}")

# Relax system to stable state
relax = db.relax(iterations=20)
print(f"Energy: {relax.initial_energy:.4f} → {relax.final_energy:.4f}")

🌐 REST API

The REST API follows a collections-based architecture (v1):

# Start the server
uvicorn m2m.api.edge_api:app --port 8000

Collections

POST   /v1/collections          # Create collection
GET    /v1/collections/{name}   # Get collection info
DELETE /v1/collections/{name}   # Delete collection

Vectors (CRUD)

POST   /v1/collections/{name}/vectors          # Add vectors
PUT    /v1/collections/{name}/vectors/{id}     # Update vector
DELETE /v1/collections/{name}/vectors/{id}     # Delete vector
POST   /v1/collections/{name}/search           # Search with filters

EBM Endpoints

POST /v1/collections/{name}/energy    # Get energy for a vector
POST /v1/collections/{name}/explore   # Find high-uncertainty regions
GET  /v1/collections/{name}/suggest   # Agent exploration suggestions
GET  /v1/collections/{name}/stats     # Collection statistics

Admin

POST /v1/admin/checkpoint   # WAL checkpoint
POST /v1/admin/backup       # Backup collections

Example

import requests, numpy as np

BASE = "http://localhost:8000"

# Create collection
requests.post(f"{BASE}/v1/collections", json={"name": "docs", "dimension": 768})

# Add vectors
vectors = np.random.randn(5, 768).astype(np.float32).tolist()
requests.post(f"{BASE}/v1/collections/docs/vectors", json={
    "ids": ["d1", "d2", "d3", "d4", "d5"],
    "vectors": vectors,
    "metadata": [{"category": "tech"}] * 5
})

# Search with filter
query = np.random.randn(768).astype(np.float32).tolist()
resp = requests.post(f"{BASE}/v1/collections/docs/search", json={
    "query": query, "k": 3, "filter": {"category": {"$eq": "tech"}}
})

🏗 Distributed Cluster & Energy Router

ClusterRouter now optionally wraps EnergyRouter for energy-aware distributed routing:

Strategy	Description
`energy_balanced`	Boltzmann probability — lower energy = higher selection chance
`round_robin`	Uniform sequential distribution
`least_loaded`	Node with fewest active queries
`locality_aware`	Prefers nodes familiar with the query region
`hybrid`	40% energy + 20% load + 30% locality + 10% latency

from m2m.cluster import ClusterRouter

router = ClusterRouter(energy_router_config={
    'enabled': True,
    'strategy': 'hybrid',
    'cache_energy': True,
    'cache_ttl_seconds': 60,
})
router.register_edge("edge-1", "http://edge1:8000", weight=1.0)
router.register_edge("edge-2", "http://edge2:8000", weight=2.0)

# Energy router automatically selects best node
selected = router.route_query(query_vector, k=10)

🌐 Omnimodal & Multimodal

M2M stores vectors from any modality. Pair with any embedding model:

Modality	Recommended Model
Text	OpenAI `text-embedding-3`, BGE, MiniLM
Images	CLIP, SigLIP
Audio	ImageBind, Whisper encoders
Video	VideoMAE, ImageBind
Spatial/3D	PointNet++, 3D Gaussian Splatting

🔗 Integrations

LangChain

from langchain.vectorstores import M2MVectorStore
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = M2MVectorStore(embedding_function=embeddings.embed_query, splat_capacity=100000)
vectorstore.add_texts(["Document 1", "Document 2"])
results = vectorstore.similarity_search("Query", k=5)

LlamaIndex

from llamaindex import VectorStoreIndex, SimpleDirectoryReader
from m2m.integrations.llamaindex import M2MVectorStore

documents = SimpleDirectoryReader("./docs").load_data()
vectorstore = M2MVectorStore(latent_dim=640, max_splats=100000)
index = VectorStoreIndex.from_documents(documents, vector_store=vectorstore)
response = index.as_query_engine().query("Your search query")

Knowledge Graphs

from m2m.graph_splat import GaussianGraphStore
from m2m.entity_extractor import M2MEntityExtractor, M2MGraphEntityExtractor

store = GaussianGraphStore(dim=640)
pipeline = M2MGraphEntityExtractor(M2MEntityExtractor(), store)
doc_id = store.add_document("Apple Inc. reported strong earnings.", embedding)
pipeline.extract_and_store(text="...", doc_embedding=embedding, doc_id=doc_id)

🏗 Architecture

Architecture

Storage Layers (v2.0)

Layer	Technology	Purpose
WAL	msgpack / JSON	Durable operation logging, crash recovery
Vectors	NumPy shards	Fast matrix operations
Metadata	SQLite	Structured metadata queries and filters
Index	Pickle	HRM2 cluster state serialization

3-Tier Memory (Advanced Mode)

Tier	Storage	Latency
Hot	VRAM	~0.1ms
Warm	RAM	~0.5ms
Cold	SSD	~10ms

⚖️ Comparison with other Vector DBs

Feature	M2M v2.0	FAISS	Pinecone	Chroma
Deployment	Local / Edge	Local	Cloud	Local / Server
CRUD	✅ Full (ids, metadata, filters)	❌	✅	✅
EBM / Energy	✅	❌	❌	❌
SOC Memory	✅	❌	❌	❌
WAL Durability	✅	❌	✅	✅
REST API	✅ Collections-based	❌	✅	✅
GPU Support	Vulkan (cross-platform)	CUDA (NVIDIA)	N/A	N/A
Offline	✅ 100%	✅	❌	✅

📊 Benchmarks

Benchmark Comparison

System	Avg Latency	Throughput	Speedup
Linear Scan	47.80ms	20.92 QPS	1.0x
M2M CPU	81.03ms	12.34 QPS	0.6x
M2M Transformed	8.68ms	115.20 QPS	5.5x

(10K vectors, 640D, dual-core edge device)

LSH Note: For purely homogeneous distributions, enable enable_lsh_fallback=True to activate Cross-Polytope LSH pre-filtering. Alternatively use M2MDatasetTransformer to induce clustering structure.

python benchmarks/run_benchmark.py --dataset sklearn --n-splats 10000 --n-queries 1000 --k 10 --device all

🚀 Installation

Requirements

Component	Minimum
Python	3.8+
NumPy	1.21+
scikit-learn	1.2+
msgpack	1.0+
FastAPI	0.100+
uvicorn	0.23+

From pip

pip install m2m-vector-search

From source

git clone https://github.com/schwabauerbriantomas-gif/m2m-vector-search.git
cd m2m-vector-search
pip install -e ".[dev]"
pytest tests/

🛠️ Troubleshooting

See Troubleshooting Guide for common issues.

📄 License & References

Licensed under the AGPLv3.

Changelog: CHANGELOG.md
Security Policy: SECURITY.md
Methodology: METHODOLOGY_CONCLUSIONS.md
Config Guide: CONFIG_RAG.md

M2M v2.0 — Machine-to-Memory, Energy-to-Intelligence

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

schwabauerbriantomas-gif

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.3

Mar 13, 2026

This version

2.0.0

Mar 9, 2026

1.5.0

Mar 8, 2026

1.1.0

Mar 8, 2026

1.0.8

Mar 8, 2026

1.0.7

Mar 7, 2026

1.0.6

Mar 6, 2026

1.0.5

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

m2m_vector_search-2.0.0.tar.gz (111.3 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

m2m_vector_search-2.0.0-py3-none-any.whl (112.2 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file m2m_vector_search-2.0.0.tar.gz.

File metadata

Download URL: m2m_vector_search-2.0.0.tar.gz
Upload date: Mar 9, 2026
Size: 111.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m2m_vector_search-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`5a0465f6c61a656ff0b5397beab8ce0e7b85dd028a0bf921ac9307acc84fc98f`
MD5	`17e166b13832760b45f30b3d467af0cc`
BLAKE2b-256	`7aadd95ed80cc7595187b9401c5048cb9849e6307f804771d41df831f2004fa9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for m2m_vector_search-2.0.0.tar.gz:

Publisher: publish.yml on schwabauerbriantomas-gif/m2m-vector-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: m2m_vector_search-2.0.0.tar.gz
- Subject digest: 5a0465f6c61a656ff0b5397beab8ce0e7b85dd028a0bf921ac9307acc84fc98f
- Sigstore transparency entry: 1066674454
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: schwabauerbriantomas-gif/m2m-vector-search@c68bcf615b41611cd9a26bb3a08999f9c90ca2c0
- Branch / Tag: refs/tags/2.0.0
- Owner: https://github.com/schwabauerbriantomas-gif
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c68bcf615b41611cd9a26bb3a08999f9c90ca2c0
- Trigger Event: release

File details

Details for the file m2m_vector_search-2.0.0-py3-none-any.whl.

File metadata

Download URL: m2m_vector_search-2.0.0-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 112.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m2m_vector_search-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cafbb89e4e7fcdca5ed2aa97277d3914f87fa73e154074dfadf59776e75fd4e`
MD5	`5cc714fe641cb6c19df33fcc28f330b7`
BLAKE2b-256	`1b84be840774a009f5f4988b204622502bc49b88390578b3e0928f3d1170b71d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for m2m_vector_search-2.0.0-py3-none-any.whl:

Publisher: publish.yml on schwabauerbriantomas-gif/m2m-vector-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: m2m_vector_search-2.0.0-py3-none-any.whl
- Subject digest: 7cafbb89e4e7fcdca5ed2aa97277d3914f87fa73e154074dfadf59776e75fd4e
- Sigstore transparency entry: 1066674459
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: schwabauerbriantomas-gif/m2m-vector-search@c68bcf615b41611cd9a26bb3a08999f9c90ca2c0
- Branch / Tag: refs/tags/2.0.0
- Owner: https://github.com/schwabauerbriantomas-gif
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c68bcf615b41611cd9a26bb3a08999f9c90ca2c0
- Trigger Event: release

m2m-vector-search 2.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

M2M EBM Vector Database

📋 Table of Contents

🆕 What's New in v2.0

🎯 Overview

Core Engine Features

⚡ Quick Start

🌓 Two Modes of Operation

1. SimpleVectorDB

2. AdvancedVectorDB

3. M2M Cluster

⚡ EBM Features

Energy Landscape

Self-Organized Criticality (SOC)

🌐 REST API

Collections

Vectors (CRUD)

EBM Endpoints

Admin

Example

🏗 Distributed Cluster & Energy Router

🌐 Omnimodal & Multimodal

🔗 Integrations

LangChain

LlamaIndex

Knowledge Graphs

🏗 Architecture

Storage Layers (v2.0)

3-Tier Memory (Advanced Mode)

⚖️ Comparison with other Vector DBs

📊 Benchmarks

🚀 Installation

Requirements

From pip

From source

🛠️ Troubleshooting

📄 License & References

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance