Skip to main content

Institutional knowledge extraction for engineering teams

Project description

Memex

Memex extracts and indexes the institutional knowledge buried in your GitHub pull requests, review threads, and ADRs — automatically, without changing how your team works.

Engineers don't write anything new. Memex hooks into the artifacts that already exist and makes them searchable.

$ memex query "why did we move off MongoDB"

Results for: why did we move off MongoDB
──────────────────────────────────────────────────────────────────────

  1. Migrate billing store to PostgreSQL                     [0.91]      Unbounded schema flexibility was causing silent data corruption
     in the billing pipeline. MongoDB's lack of enforced schema...
     knowledge/decisions/2024-11-14-migrate-billing-store-to-postgresql.md

  2. ...

How it works

  1. GitHub Action — triggers on every merged PR, calls Claude to extract decision context, and commits a structured .md file to your repo
  2. Local CLImemex index embeds your knowledge files locally; memex query runs semantic search over them
  3. ADR parser — on first run, scans your repo for existing ADR files and indexes them automatically
  4. Low-confidence nudge — when a PR looks like it contains a decision but lacks rationale, Memex posts a single comment asking for one sentence of context

Installation

pip install memex-oss

Requires Python 3.12+.


Quickstart

1. Add your API key

memex configure

This prompts for your Anthropic API key, validates it, and saves it to ~/.config/memex/config.toml.

2. Bootstrap from your existing codebase

memex init

Scans your repo for architectural decisions already embedded in config files, package manifests, and infrastructure code. Writes initial knowledge records to knowledge/decisions/.

Use --dry-run to preview what would be extracted without writing any files.

3. Pull in decisions from your git history

memex update

Walks your git history and processes merged PRs it hasn't seen yet. Use --since 2024-01-01 to scope the scan, or --limit 50 to process the most recent N PRs.

4. Index and query

memex index          # embed all knowledge files (incremental — skips unchanged files)
memex query "why did we switch from SQS to Redis"

GitHub Action setup

Add this to .github/workflows/memex.yml in any repo you want to capture:

name: Memex knowledge extraction

on:
  pull_request:
    types: [closed]

permissions:
  contents: write
  pull-requests: write

jobs:
  extract:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          token: ${{ secrets.GITHUB_TOKEN }}

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - run: pip install memex-oss

      - name: Extract knowledge
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          PR_TITLE: ${{ github.event.pull_request.title }}
          PR_BODY: ${{ github.event.pull_request.body }}
          PR_URL: ${{ github.event.pull_request.html_url }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          PR_AUTHOR: ${{ github.event.pull_request.user.login }}
          REPO: ${{ github.repository }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: python -m memex.action

      - name: Commit knowledge record
        run: |
          git config user.name "memex-bot"
          git config user.email "memex-bot@users.noreply.github.com"
          git add knowledge/
          git diff --cached --quiet || git commit -m "chore: add knowledge record [memex]"
          git push origin HEAD || true

Add ANTHROPIC_API_KEY as a repository secret. That's the only secret required.


Knowledge record format

Every extracted decision is a plain markdown file committed to knowledge/decisions/:

---
title: "Switch event queue from SQS to Redis Streams"
date: 2024-11-14
author: "srajan"
source: "https://github.com/acme/api-core/pull/2847"
pr: 2847
repo: "acme/api-core"
confidence: 0.87
tags: []
---

# Switch event queue from SQS to Redis Streams

## Context

We've been hitting SQS's 256KB message size limit consistently as event
payloads grew with per-tenant metadata.

## Decision

Switched event queue from SQS to Redis Streams.

## Alternatives considered

- SNS fanout — filtering model doesn't support per-tenant routing

## Constraints

- SQS 256KB message size limit
- Redis already running for caching (ops overhead minimal)

## Revisit signals

- Revisit when moving to multi-region setup (Redis becomes SPOF)

---

_Extracted by Memex from [PR #2847](https://github.com/acme/api-core/pull/2847) · 2024-11-14_

Files are human-readable, git-diffable, and owned by your repo. There is no external database.


What gets extracted (and what doesn't)

Memex is deliberately conservative. The expected discard rate is 70–80% of PRs.

Extracted — PRs with real decision rationale:

  • Architecture changes with alternatives discussed
  • Technology migrations with reasoning
  • Approach choices under constraints

Skipped silently — low-signal PRs caught by heuristics before any LLM call:

  • Dependency bumps (bump axios from 1.6.0 to 1.7.2)
  • Style/lint fixes, formatting changes
  • WIP/draft PRs, reverts, conventional chore: commits

Nudge comment — borderline PRs (confidence 0.30–0.40): Memex posts a single comment asking for one sentence of rationale. Posted at most once per PR.

To skip a specific PR from extraction, add the memex:skip label before merging.


CLI reference

memex configure            Prompt for API key and save to ~/.config/memex/config.toml
memex init [PATH]          Bootstrap knowledge from existing codebase (--dry-run to preview)
memex update               Process merged PRs from git history not yet indexed
  --limit N                Process at most N recent PRs
  --since DATE             Only process PRs merged after DATE (YYYY-MM-DD)
  --repo OWNER/REPO        Target a specific repo (default: current repo)
memex index                Embed knowledge files and write vectors to .memex/index.json
  --force                  Re-embed all files, ignoring the incremental cache
memex query QUESTION       Semantic search over indexed knowledge
  --top N                  Return top N results (default: 3)

How extraction works

Memex uses Claude (claude-sonnet-4-6) with Instructor for structured extraction, guaranteeing schema compliance with automatic retries.

Each extracted record includes a confidence score (0.0–1.0) reflecting how much rationale is actually present in the source PR — not a hallucinated guess. Memex never invents alternatives or constraints that aren't in the text.

Local semantic search uses fastembed (BAAI/bge-small-en-v1.5) — no second API key, no external service, no data leaves your machine during queries.


Requirements

  • Python 3.12+
  • ANTHROPIC_API_KEY — for extraction (GitHub Action) and memex init / memex update
  • gh CLI — installed automatically in GitHub Actions runners; needed locally for memex update
  • Git — for committing knowledge records from the Action

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memex_oss-0.1.0.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memex_oss-0.1.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file memex_oss-0.1.0.tar.gz.

File metadata

  • Download URL: memex_oss-0.1.0.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for memex_oss-0.1.0.tar.gz
Algorithm Hash digest
SHA256 20474cd2b797a2c10f96f23c578d477ffd9a43ee485ebab1e7c3888337d8f9e0
MD5 0ab3a213f0ebdba5a371a7fb26dcaed9
BLAKE2b-256 39d1736cb2785efafa4d0f73f282c54ee512cec815638e2bc50802d94e236742

See more details on using hashes here.

File details

Details for the file memex_oss-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: memex_oss-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for memex_oss-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 840fbb4bfdf4d843a8071a76746c1203b430ec943194830e3c901e1bc07cf9e4
MD5 c9d719c2c43863776fc75003cf10b6a2
BLAKE2b-256 479bda44db6898a4c85e122d2498d055852e7b2145603c73d339ee57dc97adda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page