Skip to main content

Conversation memory as a knowledge graph โ€” pipeline, retrieval, and generation

Project description

MemOrai

PyPI version Python 3.10+ License: MIT

Build knowledge graphs from conversations and answer questions using retrieval-augmented generation over a knowledge graph.


๐Ÿ“ฆ Installation

From PyPI

pip install memorai

With FastAPI backend extras

pip install "memorai[backend]"

From source (editable install)

git clone https://github.com/memorai/memorai.git
cd memorai
pip install -e .
# or with backend extras:
pip install -e ".[backend]"

โš™๏ธ Configuration

Copy .env.example to .env (or set environment variables directly) before running:

cp .env.example .env

Then edit .env with your real credentials.

Example:

# LLM Configuration (OpenAI-compatible endpoint)
LLM_API_KEY=your-api-key-here
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_MODEL=google/gemini-2.0-flash-001

# Embedding Model (HuggingFace model name)
EMBEDDING_MODEL=BAAI/bge-m3

# Database Configuration (Neo4j)
# Use Aura DB or a local Neo4j instance
NEO4J_URI=neo4j+s://your-id.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

๐Ÿš€ Quick Start

MemOrai uses Neo4j as a unified storage backend. It acts both as a Document Store (for pipeline states) and a Knowledge Graph. No local files are needed for retrieval once indexed.

Python API

import memorai

# Optional: configure at runtime instead of relying on os.getenv
memorai.configure(
    llm_provider="groq",
    llm_api_key="<your-llm-key>",
    llm_model="llama-3.3-70b-versatile",
    embedding_provider="cloudflare",
    cloudflare_api_token="<your-cf-token>",
    cloudflare_account_id="<your-cf-account-id>",
    neo4j_uri="neo4j+s://<your-instance>.databases.neo4j.io",
    neo4j_user="neo4j",
    neo4j_password="<your-neo4j-password>",
    max_workers=4,
    rpm_limit=60,
    timeout=120,
)

# 1. Initialize a conversation scope
memorai.create_conversation(
    conversation_id="alice-bot",
    name="Alice - Support Bot",
)

# 2. Index conversation history (builds graph in Neo4j)
history = [
    {"role": "user", "content": "My name is Alice and I live in Hanoi."},
    {"role": "assistant", "content": "Nice to meet you, Alice!"},
]

memorai.index(
    history=history,
    conversation_id="alice-bot",
    session_id="alice-session-001",
    update=True,
    fast_mode=True,
)

# 3. Retrieve using Graph Vector Search
result = memorai.retrieve(
    query="Where does Alice live?",
    conversation_id="alice-bot",
)
print(result["top_turn_contents"])

Notes:

  • conversation_id isolates tenant data in Neo4j.
  • session_id lets you append incremental chat batches inside one conversation scope.
  • fast_mode=True runs low-latency indexing (skips heavy post-processing).

CLI โ€” Full pipeline

# Run full pipeline from a JSON file
memorai pipeline \
    --input_json data/conversations.json \
    --output_dir output \
    --save_embeddings \
    --cleanup

# Answer a single question
memorai qa \
    --data_path output/graph_db/session-001 \
    --query "Where does Alice live?"

# Batch QA
memorai qa-batch \
    --questions_file questions.json \
    --data_path output/graph_db/session-001 \
    --output answers.json

๐Ÿ“‹ CLI Commands

Pipeline commands

Command Description
memorai segment Segment conversations into turns
memorai filter Filter important messages
memorai triplets Extract knowledge triplets
memorai entities Generate entity descriptions
memorai summarize Summarize segments
memorai graph Build knowledge graph
memorai pipeline Run full pipeline end-to-end

Post-processing commands

Command Description
memorai segment-chunk-map Export segment โ†’ chunk mapping
memorai consolidate-turns Deduplicate turn IDs
memorai rebuild-graph Rebuild graph after consolidation
memorai embed-turns Add turn embeddings
memorai embed-entities Add entity embeddings
memorai embed-triplets Add triplet embeddings
memorai embed-summaries Add summary embeddings

QA commands

Command Description
memorai retrieve Retrieve relevant nodes from KG
memorai qa Answer a single question
memorai qa-batch Answer a batch of questions

๐Ÿ—๏ธ Architecture

Conversation History
        โ”‚
        โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Segmenter  โ”‚  Split into semantic turns
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   Filter    โ”‚  Remove low-signal messages
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ TripletExtractorโ”‚  Extract (entity, relation, entity)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ EntityDescriptorโ”‚  Describe entities in context
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ GraphBuilderโ”‚  Build Knowledge Graph in Neo4j
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Neo4jRetrieverโ”‚  Vector Search + Cypher Traversal
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ AnswerGeneratorโ”‚  RAG over Neo4j context
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Development

# Install with dev extras
pip install -e ".[dev]"

# Run tests
pytest

# Build distribution
make build

# Check package
twine check dist/*

# Publish to PyPI
make publish

๐Ÿ“ค Publish Guide (DIY)

Use this section when you want to publish manually.

1. Prepare account + API tokens

  1. Create an account on PyPI and TestPyPI.
  2. Create API token on TestPyPI (for dry-run upload).
  3. Create API token on PyPI (real release).
  4. Keep tokens in env vars (recommended):
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-token>

2. Build package artifacts

python -m pip install --upgrade build twine
rm -rf dist build *.egg-info
python -m build
python -m twine check dist/*

Expected artifacts:

  • dist/memorai-<version>.tar.gz
  • dist/memorai-<version>-py3-none-any.whl

3. Upload to TestPyPI first

export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-testpypi-token>
python -m twine upload --repository testpypi dist/*

Install test package:

python -m pip install \
    --index-url https://test.pypi.org/simple/ \
    --extra-index-url https://pypi.org/simple \
    memorai

4. Publish to real PyPI

export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-pypi-token>
python -m twine upload dist/*

5. Verify release

python -m pip install --upgrade memorai
python -c "import memorai; print(memorai.__version__)"

Common issues

  • File already exists: bump version in pyproject.toml and memorai/__init__.py, then rebuild.
  • 403 invalid token: ensure token scope matches target index (PyPI vs TestPyPI).
  • Long README render errors: run python -m twine check dist/* before upload.

๐Ÿ“„ License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memorai-0.1.2.tar.gz (64.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memorai-0.1.2-py3-none-any.whl (77.1 kB view details)

Uploaded Python 3

File details

Details for the file memorai-0.1.2.tar.gz.

File metadata

  • Download URL: memorai-0.1.2.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for memorai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 65101c58b426964dd02a5768b6206e8be93e9ba56f12fc0da52bdd89ea23b75c
MD5 33e77bcc180a4f8fbe387eac844e9d3e
BLAKE2b-256 56c759f09f2075a12602fe5ca8a9e74ed5032de6a3cf4d46a37542e37544dfca

See more details on using hashes here.

File details

Details for the file memorai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: memorai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 77.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for memorai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a05fabf56a2291a8eb5d9b6ac8b09a5a96e1d530c26a7c6ac811f0fa7bab688e
MD5 4b1d7576ac0837f6bbd2a867fae4c688
BLAKE2b-256 507d66b21b6ead9573e6de6cab5fe6d85e8294d13247aeea23f3eafcf0e9fbbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page