镜我 · Your personal Pensieve. Turn scattered Chinese data into ready-to-use deliverables.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

labazhou2024

These details have not been verified by PyPI

Project description

Memexa · 镜我

English · 中文

Your personal Pensieve. Take the data scattered across your six silos, reorganize it around the task you have right now, and walk away with a usable document.

What is this

memexa ingests six categories of everyday Chinese-language data (WeChat, QQ, email, browser history, AI conversations, voice memos), extracts entities / relationships / temporal evidence with a two-LLM pipeline, stores them in a PostgreSQL + pgvector memory graph, and then uses 14 query subcommands to pull out whatever you need right now — who was looking for me? what's the whole story behind this? what's on my plate? where is project X across all sources?

Inspired by the Pensieve in Harry Potter — pour the memories scattered in your head into a basin, rearrange them, observe them, extract what matters for the moment.

   WeChat ─┐                                              ┌─► "Who is X?"           (arc + quick)
   QQ     ─┤                                              ├─► "Group activity last week?" (topic + trends)
   Email  ─┼──► two-LLM extract ──► PG + pgvector ──┤
   Browser─┤    (gate+extract)      memory graph        ├─► "Project X status?"    (project + timeline)
   AI chat─┤                                              ├─► "What does Y want?"   (person)
   Audio  ─┘                                              └─► "My pending actions?" (pending)
        ↑                                                       ↑
   your raw data                                          14 query subcommands
   (local, fully self-hosted)                             (cross-source composable)

v0.1 scope: complete ingestion + extraction + query + dashboard + 5 walkthroughs + 2 case studies. The "auto-generate deliverables" layer (lab reports / action cards / weekly reports / meeting briefs) is on ROADMAP.md for v0.2 — for now, compose the 14 query commands manually for the same effect.

🚀 First time here? Jump to Example walkthroughs ↓ to see 5 real scenarios end-to-end.

🤖 AI-agent compatible by design. Most real users invoke memexa through an AI agent — Claude Code, Cursor, Cline, or one they wrote themselves — rather than typing subcommands by hand. The 14 query subcommands are a small protocol; the protocol document for agents is docs/for_agents.md (hard rules, decision table, composition patterns, common pitfalls). If you're shipping an agent that needs a Chinese-data memory layer, start there.

Two audiences (design intent)

Audience	Trigger	Available now (v0.1)	Roadmap (v0.2+)
Researchers / students	Experiment done → write report; meet advisor → prep	14 queries by hand + LaTeX template	`memexa lab-report X` / `memexa brief <person>`
Knowledge workers / ops folks	Missed a deadline → recover; heading out → don't forget	`memexa pending` + `memexa quick`	`memexa action-card X` / `memexa dashboard`

Both share the same backbone (ingestion + two-LLM extract + graph + query); the deliverable templates on top differ.

Six data sources

Source	Builder	Driver
WeChat	`memexa/ingestion/v5_wechat_batch_builder.py`	`memexa/drivers/backfill_v5_wechat_driver.py`
QQ	`memexa/extraction/qq/qq_history_to_batches.py`	`memexa/drivers/backfill_v5_qq_driver.py`
Email	`memexa/ingestion/v5_email_batch_builder.py`	`memexa/drivers/backfill_v5_email_driver.py`
Browser	`memexa/ingestion/v5_browser_batch_builder.py`	`memexa/drivers/backfill_v5_browser_driver.py`
AI chat (Claude Code)	`memexa/extraction/claude_code_to_v5_converter.py`	`memexa/drivers/backfill_v5_cc_driver.py`
Audio (microphone)	`memexa/ingestion/v5_audio_batch_builder.py`	`memexa/drivers/backfill_v5_audio_driver.py`

Quickstart

# 1. Install
pip install -e .

# 2. Initialize config (creates ~/.memexa/ with 3 example files)
memexa init                          # → ~/.memexa/{aliases,identity}.yaml + .env

# 3. Start the backend
docker compose -f docker-compose.example.yml up -d

# 4. Run the demo (use --dry-run if backend isn't up yet)
python -m examples.demo_dataset.ingest --dry-run

# 5. Self-check + first query
memexa doctor                        # verify backend + LLM provider
memexa quick "<your keyword>"

Full walkthrough: docs/quickstart.md

Two-LLM gate-extract architecture

┌──────────────────────────────────────────────────────────────────────┐
│  6 categories of Chinese-language data                               │
│  WeChat │ QQ │ Email │ Browser │ AI chat │ Voice memo                │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Per-source batch builder  →  JSON envelopes                         │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Stage A: gatekeeper LLM  (filter HIGH/MEDIUM/LOW)                   │
│  Stage B: extractor LLM   (V2 envelope JSON)                         │
│  Stage C: BGE-M3 quorum + arbiter                                    │
│  Stage D: POST → memory_full_v5 bank                                 │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  PostgreSQL + pgvector + BGE-M3 embeddings + temporal links          │
└──────────────────────────────────────────────────────────────────────┘
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│  14 query subcommands + 5-phase state inference + deliverable layer  │
└──────────────────────────────────────────────────────────────────────┘

Full architecture: docs/architecture.md

Query CLI

memexa <subcmd> "<query>" [options]

14 subcommands in three tiers (basic / advanced / composite). The 8 most common:

Subcommand	Use case
`quick`	"Who is X" — point query
`topic`	"The whole story of X" — theme expansion (DO NOT use on names! see hard rules)
`arc`	"How did I meet X" — relationship arc (preferred for names)
`timeline`	"What happened during this period?" — temporal
`person`	"Status of professor Y" — person profile
`project`	"Cross-source pulse of project Z"
`pending`	"What's on my plate" — active commitments
`reflect`	LLM-synthesized answer

Full usage: docs/usage_guide.md

📖 Example walkthroughs — 5 reproducible scenarios

Install → make demo-ingest → follow a walkthrough → see memexa in action. Everything runs on a synthetic dataset (Alice / Bob / Carol / advisor@example.com). Anyone can reproduce 1:1, no real personal data.

┌────────────────────────────────────────────────────────────────────┐
│                  What question are you asking?                     │
└────────────────────────────────────────────────────────────────────┘
        │                    │                    │
        ▼                    ▼                    ▼
   "Who is X?"        "Group activity last week?"  "What's on my plate?"
   01_who_is_alice    02_weekly_team               05_my_pending
   arc + quick         topic + trends               pending + quick
        │                    │                    │
        ▼                    ▼                    ▼
   "What does Y want?" "Project X status?"
   04_advisor_said     03_project_status
   person              project + timeline

5 walkthroughs (5–10 min each):

#	Walkthrough	Scenario	Command combo
01	Who is Alice?	"How do I know X?"	`arc` + `quick`
02	Weekly team summary	"What did the group do last week?"	`topic` + `trends`
03	Project status check	"Where is project X?"	`project` + `timeline`
04	What did advisor say?	"What does Y (advisor/boss) want?"	`person`
05	My pending actions	"What's on my plate?"	`pending` + `quick`

2 case studies (methodology, 10–15 min each):

#	Case study	Audience	Output
01	Late-bound deliverable pipeline	Anyone recovering from a missed deadline	LaTeX → PDF + action card (20 min end-to-end)
02	5-minute meeting brief	Anyone prepping for a meeting	4-section Markdown brief (5 min end-to-end)

→ Index pages: examples/demo_dataset/walkthroughs/ · docs/case_studies/

Two ways to run the LLM

memexa's core is a two-LLM extract pipeline. The OSS ships everything you need to run it locally.

# Default: OSS bundled prompt + your own LLM provider
#   Set OpenAI / DeepSeek / local vLLM base_url + key in .env
export MEMEXA_EXTRACTOR_TIER=bundled

# BYO: bring your own prompt (for advanced users with existing prompt tuning)
export MEMEXA_EXTRACTOR_TIER=byo
export MEMEXA_PROMPT_PATH=/path/to/your_prompts.py

Roadmap: v0.5 will add an optional paid API endpoint, billed per token (OpenAI-style, no subscription). This is an upgrade path, not a gate — the OSS stays fully usable forever. See docs/api_roadmap.md.

Documentation index

Topic	Link
30-minute first run	docs/quickstart.md
Architecture	docs/architecture.md
14 query subcommands in depth	docs/usage_guide.md
5-phase state inference	docs/5_phase_query.md
Full environment variables	docs/configuration.md
FAQ	docs/faq.md
Troubleshooting	docs/troubleshooting.md
Performance numbers	docs/performance.md
Per-source onboarding	docs/integrations/
macOS / Windows / Linux deployment	docs/deployment/
Example walkthroughs (synthetic data)	examples/demo_dataset/walkthroughs/
Case studies (methodology)	docs/case_studies/
🤖 For AI agents (protocol doc)	docs/for_agents.md
Paid API endpoint (roadmap)	docs/api_roadmap.md
Engineering lessons learned	docs/lessons_learned/
Contribution guide	CONTRIBUTING.md
Code of conduct	CODE_OF_CONDUCT.md
Security policy	SECURITY.md
Governance	GOVERNANCE.md
Roadmap	ROADMAP.md
Support	SUPPORT.md
Citation	CITATION.cff

License

Apache 2.0. See LICENSE.

OSS core = Apache 2.0, unrestricted commercial use. The optional paid API endpoint, when it ships, will have its own service terms — see docs/api_roadmap.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

labazhou2024

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

May 17, 2026

0.1.0

May 17, 2026

0.1.0rc4 pre-release

May 16, 2026

0.1.0rc3 pre-release

May 16, 2026

This version

0.1.0rc2 pre-release

May 14, 2026

0.1.0rc1 pre-release

May 14, 2026

0.1.0a0 pre-release

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memexa-0.1.0rc2.tar.gz (1.2 MB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

memexa-0.1.0rc2-py3-none-any.whl (1.4 MB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file memexa-0.1.0rc2.tar.gz.

File metadata

Download URL: memexa-0.1.0rc2.tar.gz
Upload date: May 14, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memexa-0.1.0rc2.tar.gz
Algorithm	Hash digest
SHA256	`6e18f51982a7ad35cb85a20ca5abd7662619bda4f5b2601f2a8def0515c3b0a7`
MD5	`5e40a4dfc95c3173d18675299117e011`
BLAKE2b-256	`3ab5ed24b82e4b5629b26da7bd1217f1c544959e0db4a8fc4c01b9e869d6c00c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for memexa-0.1.0rc2.tar.gz:

Publisher: publish.yml on labazhou2024/memexa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: memexa-0.1.0rc2.tar.gz
- Subject digest: 6e18f51982a7ad35cb85a20ca5abd7662619bda4f5b2601f2a8def0515c3b0a7
- Sigstore transparency entry: 1538038297
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: labazhou2024/memexa@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be
- Branch / Tag: refs/tags/v0.1.0-rc2
- Owner: https://github.com/labazhou2024
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be
- Trigger Event: release

File details

Details for the file memexa-0.1.0rc2-py3-none-any.whl.

File metadata

Download URL: memexa-0.1.0rc2-py3-none-any.whl
Upload date: May 14, 2026
Size: 1.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memexa-0.1.0rc2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b43ec41f5ddc838bb93cf9e9501c380fb2a70b86b786962fd53c9b6d88dac51a`
MD5	`bd1266f3c3536ea88d7b1a10f257582a`
BLAKE2b-256	`553864c088db54f3910e388f768f0d97b5c7853b0a064bad647ea0e2b57fd1a2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for memexa-0.1.0rc2-py3-none-any.whl:

Publisher: publish.yml on labazhou2024/memexa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: memexa-0.1.0rc2-py3-none-any.whl
- Subject digest: b43ec41f5ddc838bb93cf9e9501c380fb2a70b86b786962fd53c9b6d88dac51a
- Sigstore transparency entry: 1538038402
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: labazhou2024/memexa@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be
- Branch / Tag: refs/tags/v0.1.0-rc2
- Owner: https://github.com/labazhou2024
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1e57aef6c8c6eb97f60f0f0e4151d96dc450a4be
- Trigger Event: release

memexa 0.1.0rc2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Memexa · 镜我

What is this

Two audiences (design intent)

Six data sources

Quickstart

Two-LLM gate-extract architecture

Query CLI

📖 Example walkthroughs — 5 reproducible scenarios

Two ways to run the LLM

Documentation index

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance