Skip to main content

镜我 · Your personal Pensieve. Turn scattered Chinese data into ready-to-use deliverables.

Project description

Memexa · 镜我

English · 中文

Your personal Pensieve. Take the data scattered across your six silos, reorganize it around the task you have right now, and walk away with a usable document.

CI CodeQL License: Apache 2.0 Python 3.10+ Code style: black PII scan

What is this

memexa ingests six categories of everyday Chinese-language data (WeChat, QQ, email, browser history, AI conversations, voice memos), extracts entities / relationships / temporal evidence with a two-LLM pipeline, stores them in a PostgreSQL + pgvector memory graph, and then uses 14 query subcommands to pull out whatever you need right now — who was looking for me? what's the whole story behind this? what's on my plate? where is project X across all sources?

Inspired by the Pensieve in Harry Potter — pour the memories scattered in your head into a basin, rearrange them, observe them, extract what matters for the moment.

   WeChat ─┐                                              ┌─► "Who is X?"           (arc + quick)
   QQ     ─┤                                              ├─► "Group activity last week?" (topic + trends)
   Email  ─┼──► two-LLM extract ──► PG + pgvector ──┤
   Browser─┤    (gate+extract)      memory graph        ├─► "Project X status?"    (project + timeline)
   AI chat─┤                                              ├─► "What does Y want?"   (person)
   Audio  ─┘                                              └─► "My pending actions?" (pending)
        ↑                                                       ↑
   your raw data                                          14 query subcommands
   (local, fully self-hosted)                             (cross-source composable)

v0.1 scope: complete ingestion + extraction + query + dashboard + 5 walkthroughs + 2 case studies. The "auto-generate deliverables" layer (lab reports / action cards / weekly reports / meeting briefs) is on ROADMAP.md for v0.2 — for now, compose the 14 query commands manually for the same effect.

🚀 First time here? Jump to Example walkthroughs ↓ to see 5 real scenarios end-to-end.

🤖 AI-agent compatible by design. Most real users invoke memexa through an AI agent — Claude Code, Cursor, Cline, or one they wrote themselves — rather than typing subcommands by hand. The 14 query subcommands are a small protocol; the protocol document for agents is docs/for_agents.md (hard rules, decision table, composition patterns, common pitfalls). If you're shipping an agent that needs a Chinese-data memory layer, start there.

Two audiences (design intent)

Audience Trigger Available now (v0.1) Roadmap (v0.2+)
Researchers / students Experiment done → write report; meet advisor → prep 14 queries by hand + LaTeX template memexa lab-report X / memexa brief <person>
Knowledge workers / ops folks Missed a deadline → recover; heading out → don't forget memexa pending + memexa quick memexa action-card X / memexa dashboard

Both share the same backbone (ingestion + two-LLM extract + graph + query); the deliverable templates on top differ.

Six data sources

Source Builder Driver
WeChat memexa/ingestion/v5_wechat_batch_builder.py memexa/drivers/backfill_v5_wechat_driver.py
QQ memexa/extraction/qq/qq_history_to_batches.py memexa/drivers/backfill_v5_qq_driver.py
Email memexa/ingestion/v5_email_batch_builder.py memexa/drivers/backfill_v5_email_driver.py
Browser memexa/ingestion/v5_browser_batch_builder.py memexa/drivers/backfill_v5_browser_driver.py
AI chat (Claude Code) memexa/extraction/claude_code_to_v5_converter.py memexa/drivers/backfill_v5_cc_driver.py
Audio (microphone) memexa/ingestion/v5_audio_batch_builder.py memexa/drivers/backfill_v5_audio_driver.py

Quickstart

# 1. Install
pip install -e .

# 2. Initialize config (creates ~/.memexa/ with 3 example files)
memexa init                          # → ~/.memexa/{aliases,identity}.yaml + .env

# 3. Start the backend
docker compose -f docker-compose.example.yml up -d

# 4. Run the demo (use --dry-run if backend isn't up yet)
python -m examples.demo_dataset.ingest --dry-run

# 5. Self-check + first query
memexa doctor                        # verify backend + LLM provider
memexa quick "<your keyword>"

Full walkthrough: docs/quickstart.md

Two-LLM gate-extract architecture

┌──────────────────────────────────────────────────────────────────────┐
│  6 categories of Chinese-language data                               │
│  WeChat │ QQ │ Email │ Browser │ AI chat │ Voice memo                │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Per-source batch builder  →  JSON envelopes                         │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Stage A: gatekeeper LLM  (filter HIGH/MEDIUM/LOW)                   │
│  Stage B: extractor LLM   (V2 envelope JSON)                         │
│  Stage C: BGE-M3 quorum + arbiter                                    │
│  Stage D: POST → memory_full_v5 bank                                 │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  PostgreSQL + pgvector + BGE-M3 embeddings + temporal links          │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  14 query subcommands + 5-phase state inference + deliverable layer  │
└──────────────────────────────────────────────────────────────────────┘

Full architecture: docs/architecture.md

Query CLI

memexa <subcmd> "<query>" [options]

14 subcommands in three tiers (basic / advanced / composite). The 8 most common:

Subcommand Use case
quick "Who is X" — point query
topic "The whole story of X" — theme expansion (DO NOT use on names! see hard rules)
arc "How did I meet X" — relationship arc (preferred for names)
timeline "What happened during this period?" — temporal
person "Status of professor Y" — person profile
project "Cross-source pulse of project Z"
pending "What's on my plate" — active commitments
reflect LLM-synthesized answer

Full usage: docs/usage_guide.md

📖 Example walkthroughs — 5 reproducible scenarios

Install → make demo-ingest → follow a walkthrough → see memexa in action. Everything runs on a synthetic dataset (Alice / Bob / Carol / advisor@example.com). Anyone can reproduce 1:1, no real personal data.

┌────────────────────────────────────────────────────────────────────┐
│                  What question are you asking?                     │
└────────────────────────────────────────────────────────────────────┘
        │                    │                    │
        ▼                    ▼                    ▼
   "Who is X?"        "Group activity last week?"  "What's on my plate?"
   01_who_is_alice    02_weekly_team               05_my_pending
   arc + quick         topic + trends               pending + quick
        │                    │                    │
        ▼                    ▼                    ▼
   "What does Y want?" "Project X status?"
   04_advisor_said     03_project_status
   person              project + timeline

5 walkthroughs (5–10 min each):

# Walkthrough Scenario Command combo
01 Who is Alice? "How do I know X?" arc + quick
02 Weekly team summary "What did the group do last week?" topic + trends
03 Project status check "Where is project X?" project + timeline
04 What did advisor say? "What does Y (advisor/boss) want?" person
05 My pending actions "What's on my plate?" pending + quick

2 case studies (methodology, 10–15 min each):

# Case study Audience Output
01 Late-bound deliverable pipeline Anyone recovering from a missed deadline LaTeX → PDF + action card (20 min end-to-end)
02 5-minute meeting brief Anyone prepping for a meeting 4-section Markdown brief (5 min end-to-end)

→ Index pages: examples/demo_dataset/walkthroughs/ · docs/case_studies/

Two ways to run the LLM

memexa's core is a two-LLM extract pipeline. The OSS ships everything you need to run it locally.

# Default: OSS bundled prompt + your own LLM provider
#   Set OpenAI / DeepSeek / local vLLM base_url + key in .env
export MEMEXA_EXTRACTOR_TIER=bundled

# BYO: bring your own prompt (for advanced users with existing prompt tuning)
export MEMEXA_EXTRACTOR_TIER=byo
export MEMEXA_PROMPT_PATH=/path/to/your_prompts.py

Roadmap: v0.5 will add an optional paid API endpoint, billed per token (OpenAI-style, no subscription). This is an upgrade path, not a gate — the OSS stays fully usable forever. See docs/api_roadmap.md.

Documentation index

Topic Link
30-minute first run docs/quickstart.md
Architecture docs/architecture.md
14 query subcommands in depth docs/usage_guide.md
5-phase state inference docs/5_phase_query.md
Full environment variables docs/configuration.md
FAQ docs/faq.md
Troubleshooting docs/troubleshooting.md
Performance numbers docs/performance.md
Per-source onboarding docs/integrations/
macOS / Windows / Linux deployment docs/deployment/
Example walkthroughs (synthetic data) examples/demo_dataset/walkthroughs/
Case studies (methodology) docs/case_studies/
🤖 For AI agents (protocol doc) docs/for_agents.md
Paid API endpoint (roadmap) docs/api_roadmap.md
Engineering lessons learned docs/lessons_learned/
Contribution guide CONTRIBUTING.md
Code of conduct CODE_OF_CONDUCT.md
Security policy SECURITY.md
Governance GOVERNANCE.md
Roadmap ROADMAP.md
Support SUPPORT.md
Citation CITATION.cff

License

Apache 2.0. See LICENSE.

OSS core = Apache 2.0, unrestricted commercial use. The optional paid API endpoint, when it ships, will have its own service terms — see docs/api_roadmap.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memexa-0.1.0rc2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memexa-0.1.0rc2-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file memexa-0.1.0rc2.tar.gz.

File metadata

  • Download URL: memexa-0.1.0rc2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memexa-0.1.0rc2.tar.gz
Algorithm Hash digest
SHA256 6e18f51982a7ad35cb85a20ca5abd7662619bda4f5b2601f2a8def0515c3b0a7
MD5 5e40a4dfc95c3173d18675299117e011
BLAKE2b-256 3ab5ed24b82e4b5629b26da7bd1217f1c544959e0db4a8fc4c01b9e869d6c00c

See more details on using hashes here.

Provenance

The following attestation bundles were made for memexa-0.1.0rc2.tar.gz:

Publisher: publish.yml on labazhou2024/memexa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file memexa-0.1.0rc2-py3-none-any.whl.

File metadata

  • Download URL: memexa-0.1.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memexa-0.1.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 b43ec41f5ddc838bb93cf9e9501c380fb2a70b86b786962fd53c9b6d88dac51a
MD5 bd1266f3c3536ea88d7b1a10f257582a
BLAKE2b-256 553864c088db54f3910e388f768f0d97b5c7853b0a064bad647ea0e2b57fd1a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for memexa-0.1.0rc2-py3-none-any.whl:

Publisher: publish.yml on labazhou2024/memexa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page