Skip to main content

Conversation memory as a knowledge graph โ€” pipeline, retrieval, and generation

Project description

MemOrai

PyPI version Python 3.10+ License: MIT

Build knowledge graphs from conversations and answer questions using retrieval-augmented generation over a knowledge graph.


๐Ÿ“ฆ Installation

From PyPI

pip install memorai

With FastAPI backend extras

pip install "memorai[backend]"

From source (editable install)

git clone https://github.com/memorai/memorai.git
cd memorai
pip install -e .
# or with backend extras:
pip install -e ".[backend]"

โš™๏ธ Configuration

Copy .env.example to .env (or set environment variables directly) before running:

cp .env.example .env

Then edit .env with your real credentials.

Example:

# LLM Configuration (OpenAI-compatible endpoint)
LLM_API_KEY=your-api-key-here
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_MODEL=google/gemini-2.0-flash-001

# Embedding Model (HuggingFace model name)
EMBEDDING_MODEL=BAAI/bge-m3

# Database Configuration (Neo4j)
# Use Aura DB or a local Neo4j instance
NEO4J_URI=neo4j+s://your-id.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

๐Ÿš€ Quick Start

MemOrai uses Neo4j as a unified storage backend. It acts both as a Document Store (for pipeline states) and a Knowledge Graph. No local files are needed for retrieval once indexed.

Python API

import memorai

# Optional: configure at runtime instead of relying on os.getenv
memorai.configure(
    llm_provider="groq",
    llm_api_key="<your-llm-key>",
    llm_model="llama-3.3-70b-versatile",
    embedding_provider="cloudflare",
    cloudflare_api_token="<your-cf-token>",
    cloudflare_account_id="<your-cf-account-id>",
    neo4j_uri="neo4j+s://<your-instance>.databases.neo4j.io",
    neo4j_user="neo4j",
    neo4j_password="<your-neo4j-password>",
    max_workers=4,
    rpm_limit=60,
    timeout=120,
)

# 1. Initialize a conversation scope
memorai.create_conversation(
    conversation_id="alice-bot",
    name="Alice - Support Bot",
)

# 2. Index conversation history (builds graph in Neo4j)
history = [
    {"role": "user", "content": "My name is Alice and I live in Hanoi."},
    {"role": "assistant", "content": "Nice to meet you, Alice!"},
]

memorai.index(
    history=history,
    conversation_id="alice-bot",
    session_id="alice-session-001",
    update=True,
    fast_mode=True,
)

# 3. Retrieve using Graph Vector Search
result = memorai.retrieve(
    query="Where does Alice live?",
    conversation_id="alice-bot",
)
print(result["top_turn_contents"])

Notes:

  • conversation_id isolates tenant data in Neo4j.
  • session_id lets you append incremental chat batches inside one conversation scope.
  • fast_mode=True runs low-latency indexing (skips heavy post-processing).

CLI โ€” Full pipeline

# Run full pipeline from a JSON file
memorai pipeline \
    --input_json data/conversations.json \
    --output_dir output \
    --save_embeddings \
    --cleanup

# Answer a single question
memorai qa \
    --data_path output/graph_db/session-001 \
    --query "Where does Alice live?"

# Batch QA
memorai qa-batch \
    --questions_file questions.json \
    --data_path output/graph_db/session-001 \
    --output answers.json

๐Ÿ“‹ CLI Commands

Pipeline commands

Command Description
memorai segment Segment conversations into turns
memorai filter Filter important messages
memorai triplets Extract knowledge triplets
memorai entities Generate entity descriptions
memorai summarize Summarize segments
memorai graph Build knowledge graph
memorai pipeline Run full pipeline end-to-end

Post-processing commands

Command Description
memorai segment-chunk-map Export segment โ†’ chunk mapping
memorai consolidate-turns Deduplicate turn IDs
memorai rebuild-graph Rebuild graph after consolidation
memorai embed-turns Add turn embeddings
memorai embed-entities Add entity embeddings
memorai embed-triplets Add triplet embeddings
memorai embed-summaries Add summary embeddings

QA commands

Command Description
memorai retrieve Retrieve relevant nodes from KG
memorai qa Answer a single question
memorai qa-batch Answer a batch of questions

๐Ÿ—๏ธ Architecture

Conversation History
        โ”‚
        โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Segmenter  โ”‚  Split into semantic turns
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   Filter    โ”‚  Remove low-signal messages
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ TripletExtractorโ”‚  Extract (entity, relation, entity)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ EntityDescriptorโ”‚  Describe entities in context
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ GraphBuilderโ”‚  Build Knowledge Graph in Neo4j
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Neo4jRetrieverโ”‚  Vector Search + Cypher Traversal
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ AnswerGeneratorโ”‚  RAG over Neo4j context
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Development

# Install with dev extras
pip install -e ".[dev]"

# Run tests
pytest

# Build distribution
make build

# Check package
twine check dist/*

# Publish to PyPI
make publish

๐Ÿ“ค Publish Guide (DIY)

Use this section when you want to publish manually.

1. Prepare account + API tokens

  1. Create an account on PyPI and TestPyPI.
  2. Create API token on TestPyPI (for dry-run upload).
  3. Create API token on PyPI (real release).
  4. Keep tokens in env vars (recommended):
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-token>

2. Build package artifacts

python -m pip install --upgrade build twine
rm -rf dist build *.egg-info
python -m build
python -m twine check dist/*

Expected artifacts:

  • dist/memorai-<version>.tar.gz
  • dist/memorai-<version>-py3-none-any.whl

3. Upload to TestPyPI first

export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-testpypi-token>
python -m twine upload --repository testpypi dist/*

Install test package:

python -m pip install \
    --index-url https://test.pypi.org/simple/ \
    --extra-index-url https://pypi.org/simple \
    memorai

4. Publish to real PyPI

export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-pypi-token>
python -m twine upload dist/*

5. Verify release

python -m pip install --upgrade memorai
python -c "import memorai; print(memorai.__version__)"

Common issues

  • File already exists: bump version in pyproject.toml and memorai/__init__.py, then rebuild.
  • 403 invalid token: ensure token scope matches target index (PyPI vs TestPyPI).
  • Long README render errors: run python -m twine check dist/* before upload.

๐Ÿ“„ License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memorai-0.1.3.tar.gz (64.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memorai-0.1.3-py3-none-any.whl (77.3 kB view details)

Uploaded Python 3

File details

Details for the file memorai-0.1.3.tar.gz.

File metadata

  • Download URL: memorai-0.1.3.tar.gz
  • Upload date:
  • Size: 64.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for memorai-0.1.3.tar.gz
Algorithm Hash digest
SHA256 45ef96cb2f29348b15376c96a97036f47551cf3e81bdebabe8784f2cd4ef0b5e
MD5 70fe82cbd72cffe0fa20b612377ff3f6
BLAKE2b-256 50bce5b9742315b62dc3c6d25b7a90cdc1753b95b7c925ee87b61b18a467b2a4

See more details on using hashes here.

File details

Details for the file memorai-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: memorai-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 77.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for memorai-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ef83fd47f7bcefc5686d3d7b2a4a0dac13cfa65fb21793f89a08ed39c44dbd65
MD5 04e9c7c60002a3564f9710728a04f56f
BLAKE2b-256 8fbc42d770672865f73a9b3b3e4bc7c1025e960837548181ab766947176f6d46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page