镜我 · Your personal Pensieve. Turn scattered Chinese data into ready-to-use deliverables.
Project description
Memexa · 镜我
English · 中文
Your personal Pensieve. Take the data scattered across your six silos, reorganize it around the task you have right now, and walk away with a usable document.
What is this
memexa ingests six categories of everyday Chinese-language data (WeChat, QQ,
email, browser history, AI conversations, voice memos), extracts entities /
relationships / temporal evidence with a two-LLM pipeline, stores them in a
PostgreSQL + pgvector memory graph, and then uses 14 query subcommands to
pull out whatever you need right now — who was looking for me? what's the
whole story behind this? what's on my plate? where is project X across all
sources?
Inspired by the Pensieve in Harry Potter — pour the memories scattered in your head into a basin, rearrange them, observe them, extract what matters for the moment.
WeChat ─┐ ┌─► "Who is X?" (arc + quick)
QQ ─┤ ├─► "Group activity last week?" (topic + trends)
Email ─┼──► two-LLM extract ──► PG + pgvector ──┤
Browser─┤ (gate+extract) memory graph ├─► "Project X status?" (project + timeline)
AI chat─┤ ├─► "What does Y want?" (person)
Audio ─┘ └─► "My pending actions?" (pending)
↑ ↑
your raw data 14 query subcommands
(local, fully self-hosted) (cross-source composable)
v0.1 scope: complete ingestion + extraction + query + dashboard + 5 walkthroughs + 2 case studies. The "auto-generate deliverables" layer (lab reports / action cards / weekly reports / meeting briefs) is on ROADMAP.md for v0.2 — for now, compose the 14 query commands manually for the same effect.
🚀 First time here? Jump to Example walkthroughs ↓ to see 5 real scenarios end-to-end.
🤖 AI-agent compatible by design. Most real users invoke memexa through an AI agent — Claude Code, Cursor, Cline, or one they wrote themselves — rather than typing subcommands by hand. The 14 query subcommands are a small protocol; the protocol document for agents is docs/for_agents.md (hard rules, decision table, composition patterns, common pitfalls). If you're shipping an agent that needs a Chinese-data memory layer, start there.
Two audiences (design intent)
| Audience | Trigger | Available now (v0.1) | Roadmap (v0.2+) |
|---|---|---|---|
| Researchers / students | Experiment done → write report; meet advisor → prep | 14 queries by hand + LaTeX template | memexa lab-report X / memexa brief <person> |
| Knowledge workers / ops folks | Missed a deadline → recover; heading out → don't forget | memexa pending + memexa quick |
memexa action-card X / memexa dashboard |
Both share the same backbone (ingestion + two-LLM extract + graph + query); the deliverable templates on top differ.
Six data sources
| Source | Builder | Driver |
|---|---|---|
memexa/ingestion/v5_wechat_batch_builder.py |
memexa/drivers/backfill_v5_wechat_driver.py |
|
memexa/extraction/qq/qq_history_to_batches.py |
memexa/drivers/backfill_v5_qq_driver.py |
|
memexa/ingestion/v5_email_batch_builder.py |
memexa/drivers/backfill_v5_email_driver.py |
|
| Browser | memexa/ingestion/v5_browser_batch_builder.py |
memexa/drivers/backfill_v5_browser_driver.py |
| AI chat (Claude Code) | memexa/extraction/claude_code_to_v5_converter.py |
memexa/drivers/backfill_v5_cc_driver.py |
| Audio (microphone) | memexa/ingestion/v5_audio_batch_builder.py |
memexa/drivers/backfill_v5_audio_driver.py |
Quickstart
# 1. Install
pip install -e .
# 2. Initialize config (creates ~/.memexa/ with 3 example files)
memexa init # → ~/.memexa/{aliases,identity}.yaml + .env
# 3. Start the backend
docker compose -f docker-compose.example.yml up -d
# 4. Run the demo (use --dry-run if backend isn't up yet)
python -m examples.demo_dataset.ingest --dry-run
# 5. Self-check + first query
memexa doctor # verify backend + LLM provider
memexa quick "<your keyword>"
Full walkthrough: docs/quickstart.md
Two-LLM gate-extract architecture
┌──────────────────────────────────────────────────────────────────────┐
│ 6 categories of Chinese-language data │
│ WeChat │ QQ │ Email │ Browser │ AI chat │ Voice memo │
└──────────────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────────┐
│ Per-source batch builder → JSON envelopes │
└──────────────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────────┐
│ Stage A: gatekeeper LLM (filter HIGH/MEDIUM/LOW) │
│ Stage B: extractor LLM (V2 envelope JSON) │
│ Stage C: BGE-M3 quorum + arbiter │
│ Stage D: POST → memory_full_v5 bank │
└──────────────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────────┐
│ PostgreSQL + pgvector + BGE-M3 embeddings + temporal links │
└──────────────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 14 query subcommands + 5-phase state inference + deliverable layer │
└──────────────────────────────────────────────────────────────────────┘
Full architecture: docs/architecture.md
Query CLI
memexa <subcmd> "<query>" [options]
14 subcommands in three tiers (basic / advanced / composite). The 8 most common:
| Subcommand | Use case |
|---|---|
quick |
"Who is X" — point query |
topic |
"The whole story of X" — theme expansion (DO NOT use on names! see hard rules) |
arc |
"How did I meet X" — relationship arc (preferred for names) |
timeline |
"What happened during this period?" — temporal |
person |
"Status of professor Y" — person profile |
project |
"Cross-source pulse of project Z" |
pending |
"What's on my plate" — active commitments |
reflect |
LLM-synthesized answer |
Full usage: docs/usage_guide.md
📖 Example walkthroughs — 5 reproducible scenarios
Install →
make demo-ingest→ follow a walkthrough → see memexa in action. Everything runs on a synthetic dataset (Alice / Bob / Carol / advisor@example.com). Anyone can reproduce 1:1, no real personal data.
┌────────────────────────────────────────────────────────────────────┐
│ What question are you asking? │
└────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
"Who is X?" "Group activity last week?" "What's on my plate?"
01_who_is_alice 02_weekly_team 05_my_pending
arc + quick topic + trends pending + quick
│ │ │
▼ ▼ ▼
"What does Y want?" "Project X status?"
04_advisor_said 03_project_status
person project + timeline
5 walkthroughs (5–10 min each):
| # | Walkthrough | Scenario | Command combo |
|---|---|---|---|
| 01 | Who is Alice? | "How do I know X?" | arc + quick |
| 02 | Weekly team summary | "What did the group do last week?" | topic + trends |
| 03 | Project status check | "Where is project X?" | project + timeline |
| 04 | What did advisor say? | "What does Y (advisor/boss) want?" | person |
| 05 | My pending actions | "What's on my plate?" | pending + quick |
2 case studies (methodology, 10–15 min each):
| # | Case study | Audience | Output |
|---|---|---|---|
| 01 | Late-bound deliverable pipeline | Anyone recovering from a missed deadline | LaTeX → PDF + action card (20 min end-to-end) |
| 02 | 5-minute meeting brief | Anyone prepping for a meeting | 4-section Markdown brief (5 min end-to-end) |
→ Index pages: examples/demo_dataset/walkthroughs/ · docs/case_studies/
Two ways to run the LLM
memexa's core is a two-LLM extract pipeline. The OSS ships everything you need to run it locally.
# Default: OSS bundled prompt + your own LLM provider
# Set OpenAI / DeepSeek / local vLLM base_url + key in .env
export MEMEXA_EXTRACTOR_TIER=bundled
# BYO: bring your own prompt (for advanced users with existing prompt tuning)
export MEMEXA_EXTRACTOR_TIER=byo
export MEMEXA_PROMPT_PATH=/path/to/your_prompts.py
Roadmap: v0.5 will add an optional paid API endpoint, billed per token (OpenAI-style, no subscription). This is an upgrade path, not a gate — the OSS stays fully usable forever. See docs/api_roadmap.md.
Documentation index
| Topic | Link |
|---|---|
| 30-minute first run | docs/quickstart.md |
| Architecture | docs/architecture.md |
| 14 query subcommands in depth | docs/usage_guide.md |
| 5-phase state inference | docs/5_phase_query.md |
| Full environment variables | docs/configuration.md |
| FAQ | docs/faq.md |
| Troubleshooting | docs/troubleshooting.md |
| Performance numbers | docs/performance.md |
| Per-source onboarding | docs/integrations/ |
| macOS / Windows / Linux deployment | docs/deployment/ |
| Example walkthroughs (synthetic data) | examples/demo_dataset/walkthroughs/ |
| Case studies (methodology) | docs/case_studies/ |
| 🤖 For AI agents (protocol doc) | docs/for_agents.md |
| Paid API endpoint (roadmap) | docs/api_roadmap.md |
| Engineering lessons learned | docs/lessons_learned/ |
| Contribution guide | CONTRIBUTING.md |
| Code of conduct | CODE_OF_CONDUCT.md |
| Security policy | SECURITY.md |
| Governance | GOVERNANCE.md |
| Roadmap | ROADMAP.md |
| Support | SUPPORT.md |
| Citation | CITATION.cff |
License
Apache 2.0. See LICENSE.
OSS core = Apache 2.0, unrestricted commercial use. The optional paid API endpoint, when it ships, will have its own service terms — see docs/api_roadmap.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memexa-0.1.0rc2.tar.gz.
File metadata
- Download URL: memexa-0.1.0rc2.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e18f51982a7ad35cb85a20ca5abd7662619bda4f5b2601f2a8def0515c3b0a7
|
|
| MD5 |
5e40a4dfc95c3173d18675299117e011
|
|
| BLAKE2b-256 |
3ab5ed24b82e4b5629b26da7bd1217f1c544959e0db4a8fc4c01b9e869d6c00c
|
Provenance
The following attestation bundles were made for memexa-0.1.0rc2.tar.gz:
Publisher:
publish.yml on labazhou2024/memexa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
memexa-0.1.0rc2.tar.gz -
Subject digest:
6e18f51982a7ad35cb85a20ca5abd7662619bda4f5b2601f2a8def0515c3b0a7 - Sigstore transparency entry: 1538038297
- Sigstore integration time:
-
Permalink:
labazhou2024/memexa@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be -
Branch / Tag:
refs/tags/v0.1.0-rc2 - Owner: https://github.com/labazhou2024
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be -
Trigger Event:
release
-
Statement type:
File details
Details for the file memexa-0.1.0rc2-py3-none-any.whl.
File metadata
- Download URL: memexa-0.1.0rc2-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b43ec41f5ddc838bb93cf9e9501c380fb2a70b86b786962fd53c9b6d88dac51a
|
|
| MD5 |
bd1266f3c3536ea88d7b1a10f257582a
|
|
| BLAKE2b-256 |
553864c088db54f3910e388f768f0d97b5c7853b0a064bad647ea0e2b57fd1a2
|
Provenance
The following attestation bundles were made for memexa-0.1.0rc2-py3-none-any.whl:
Publisher:
publish.yml on labazhou2024/memexa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
memexa-0.1.0rc2-py3-none-any.whl -
Subject digest:
b43ec41f5ddc838bb93cf9e9501c380fb2a70b86b786962fd53c9b6d88dac51a - Sigstore transparency entry: 1538038402
- Sigstore integration time:
-
Permalink:
labazhou2024/memexa@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be -
Branch / Tag:
refs/tags/v0.1.0-rc2 - Owner: https://github.com/labazhou2024
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be -
Trigger Event:
release
-
Statement type: