The Logos Protocol: Deterministic Logseq AST parsing for Matryca.ai.
Project description
🔱 Logseq Matryca Parser (The Logos Protocol)
Stop feeding broken Markdown to your AI.
In active Beta — heavily tested (141+ tests), headless CRUD engine, and static typing; ready for community integration.
Turning a forest of local plain-text files into a unified semantic powerhouse.
🌐 The Vision: Virtual Centralization vs. Binary Lock-in
The PKM (Personal Knowledge Management) world is currently forcing users to make a painful choice between Data Longevity and AI Power.
- Vanilla Logseq / Obsidian is a "Forest" of decentralized Markdown files. It guarantees the Lindy effect (plain-text lasts forever) and perfect Git versioning, but standard AI chunkers treat it like a blender, destroying the outliner hierarchy.
- Tana is a centralized "Tree". It offers incredible semantic power, but traps your brain in a proprietary cloud database.
- The new Logseq DB (SQLite) aims for database speed, but at a huge cost: it locks your notes inside a binary
.dbfile. You lose human-readable files, you lose line-by-line Git diffs, and you lose the immortality of plain-text.
🔱 The Matryca Solution: The Best of Both Worlds
Logseq Matryca Parser is the ultimate bridge. It allows you to keep your sovereign, future-proof Markdown files, while synthesizing a Virtual Global Graph in RAM at runtime.
It acts as the strict File System Driver for your LLM OS. By using a deterministic Stack-Machine to parse your outliner topology, it feeds LangChain or LlamaIndex with the exact parent-child context of every single block.
You get the reasoning power of a centralized relational database, without sacrificing the plain-text soul of your Second Brain in Logseq.
⚖️ The PKM Landscape
| Feature | Vanilla Markdown | Matryca Parser | Logseq DB (SQLite) | Tana |
|---|---|---|---|---|
| Data Format | Plain-text (.md) | Plain-text (.md) | Binary (.db) | Proprietary Cloud |
| Version Control | Perfect (Git) | Perfect (Git) | Poor (Binary blob) | None |
| Data Structure | Decentralized Forest | Virtually Centralized Graph | Relational Database | Centralized Tree |
| AI Readiness | Low (Linear Chunks) | High (Topological AST) | TBD (Requires SQL) | High (Proprietary) |
| Sovereignty | 100% Local | 100% Local (Sovereign AI) | 100% Local | Cloud-Only |
🧭 Matryca vs. naive framework loaders
| Capability | Typical LangChain / LlamaIndex Markdown loaders | Matryca (LOGOS + SYNAPSE + graph) |
|---|---|---|
| Parent–child context | Character or heading splits; children often orphaned from parents | True outliner AST: every block carries parent_id, path, left_id and visits in deterministic tree order |
Block references ((uuid)) |
Treated as opaque text or dropped | Resolved against LogseqGraph; optional embed expansion and Obsidian [[Page#^anchor]] export |
| Property inheritance | Page-level frontmatter at best | get_effective_properties: page + ancestor outline keys merged top-down (Org-mode style), then exposed on enriched chunks |
| Live sync | Re-read whole tree or poll | LogseqGraph.start_watching() (optional watchdog): per-file invalidation — re-parse one page, purge stale UUIDs from registries, refresh backlinks |
🚀 The Problem
Standard RAG pipelines treat your notes like a blender. They chop Markdown into random shards, destroying the parent-child hierarchy that makes Logseq powerful.
graph TD
Raw[(Logseq Markdown\nFiles)]
subgraph Standard RAG
Blender[Standard Text Splitter\n'The Blender']
Chunk1[Chunk 1: Orphan text]
Chunk2[Chunk 2: Lost context]
Blender --> Chunk1 & Chunk2
end
subgraph Matryca Parser
Architect[Logos Engine\nStack-Machine]
Parent[Parent Node\n+ Properties]
Child[Child Node\n+ Task State & Time]
Architect --> Parent --> Child
end
Raw --> Blender
Raw --> Architect
classDef bad fill:#fee2e2,stroke:#ef4444,color:#000;
classDef good fill:#dcfce7,stroke:#22c55e,color:#000;
class Chunk1,Chunk2 bad;
class Parent,Child good;
🔱 The Solution
Logseq Matryca Parser is a deterministic Stack-Machine engine that acts as the File System Driver for your LLM. It preserves the true topology of your thoughts, ensuring AI understands spatial hierarchy, time, and block-lineage—including structured task state and first-class temporal attributes you can query in downstream graph databases and GraphRAG engines without re-parsing raw Markdown.
⚡ Recent superpowers (Waves 4–12)
Obsidian-native export
Compile an entire Logseq graph into an Obsidian vault layout: YAML frontmatter from page properties, list body preserved, Logseq ((uuid)) links rewritten to [[Page#^anchor]], and trailing ^block-id on referenced blocks. Namespace titles become nested folders (e.g. Projects/AI/Demo.md).
matryca-parse export /path/to/logseq/graph /path/to/obsidian/vault --format obsidian
Note: Wikilinks currently use the Logseq page title (e.g.
[[Target#^…]]). Vault files may live under namespace folders (Projects/AI/Demo.md). Obsidian usually resolves unique titles; aligning link text to folder paths is a possible future refinement.
Live incremental watcher
LogseqGraph supports surgical file invalidation (optional dependency: pip install 'logseq-matryca-parser[watch]'). start_watching() runs a recursive watchdog observer: on created / modified under pages/ or journals/, only that file is re-parsed; stale synthetic UUIDs are purged from _node_registry and scrubbed from _backlink_registry—no full-graph cold reload.
Fluent topological queries
Filter the global node registry with a chainable API (tags, task state, ancestry under a parent UUID):
from logseq_matryca_parser.graph import LogseqGraph
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
hits = (
graph.query()
.has_tag("idea")
.under_parent("aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee")
.is_task_state("TODO")
.execute()
)
Agent-Native X-Ray Mode (Token Optimization)
For autonomous LLM agents, passing raw Markdown into the context window wastes thousands of tokens on 36-character UUIDs, hidden id:: properties, drawers, and collapsed directives that carry no immediate semantic signal. X-Ray mode compresses the parsed AST into ultra-dense, zero-fluff plain text: each block becomes {indent}[{alias}] {clean_text}, with heavy Logseq UUIDs replaced by sequential integer aliases ([0], [1], …) held in a session registry. On typical outlines this can reduce context consumption by up to ~35× compared to dumping full block payloads.
matryca-parse agent-read /path/to/graph --tag idea
matryca-parse agent-read /path/to/graph --query "quantum"
The agent reads cheap topology now; the registry resolves aliases back to sovereign UUIDs when you wire targeted writes.
Headless Write Engine & AST Linter (Wave 12)
The parser is no longer read-only. Wave 12 adds a headless Markdown splicer (agent_writer.py): append_child_to_node uses AST line numbers and indentation ((indent_level + 1) × tab_size) to insert a new bullet atomically into the sovereign .md file—via tempfile + os.replace—without Logseq’s fragile HTTP API. Pair agent-read with agent-write: X-Ray persists its alias map to .matryca_xray_state.json at the graph root so stateless CLI invocations can read, then write in sequence.
matryca-parse agent-read /path/to/graph --tag idea
matryca-parse agent-write /path/to/graph --alias 0 --content "Follow-up from the agent"
For graph hygiene, LogseqGraph.get_broken_references() flags nodes whose ((uuid)) block refs point at missing registry targets—structural linting, not regex guessing.
🏗️ Core Capabilities
| Feature | Description |
|---|---|
| LOGOS Engine | Deterministic AST parsing. No regex-guessing. Handles id::, aliases, and multiline blocks. |
| Advanced Task Extraction | Task state (TODO / DOING / …), priority markers [#A]–[#C] promoted to task_priority, and SCHEDULED / DEADLINE Logseq timestamps normalized to UTC Unix epoch seconds on scheduled_at / deadline_at for temporal graph and retrieval pipelines. |
| SYNAPSE Adapter | Native exports for LangChain and LlamaIndex with automated lineage metadata; context-enriched chunks with breadcrumbs, embed expansion, and inherited properties. |
| FORGE | JSON, clean Markdown, and Obsidian vault serialization (ObsidianForgeVisitor, ForgeExporter.to_obsidian_markdown). |
| LENS Visualizer | 60FPS interactive graph rendering (10k+ nodes) with Glassmorphism HUD. |
| Agent-Native Printing Press | agent_press.py: SessionAliasRegistry maps session aliases ↔ block UUIDs; to_xray_markdown emits token-minimal outline text for autonomous agents (matryca-parse agent-read). |
| Headless Write Engine | agent_writer.py: append_child_to_node splices child bullets into on-disk Markdown from AST topology; matryca-parse agent-write resolves aliases via .matryca_xray_state.json. |
| AST Linters | LogseqGraph.get_broken_references() returns originating nodes when block_refs target UUIDs absent from the global registry. |
| Sovereign AI | 100% Local. Zero telemetry. Private by design. |
Data model — LogseqNode task fields
Each AST block is a LogseqNode. Alongside task_status, the parser surfaces priority and schedule metadata as typed fields (epoch integers are seconds since Unix epoch, UTC):
{
"uuid": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"task_status": "TODO",
"task_priority": "A",
"scheduled_at": 1641600000,
"deadline_at": 1641772800,
"clean_text": "Cut v0.3.2 release"
}
Marker syntax ([#A], SCHEDULED: <...>, DEADLINE: <...>) is stripped from clean_text so embeddings stay clean; the promoted fields carry the structured signal for downstream graph databases and GraphRAG engines.
🛠️ Quickstart
# Install from GitHub (PyPI distribution tracked on roadmap)
pip install git+https://github.com/MarcoPorcellato/logseq-matryca-parser.git
# Optional: filesystem watcher for live incremental graph updates
pip install 'logseq-matryca-parser[watch]'
# 1. Visualize your local graph (LENS)
matryca-parse visualize /path/to/logseq/graph my-map.html
# 2. Export for AI / RAG (SYNAPSE)
matryca-parse export /path/to/logseq/graph output --format langchain
# 3. Context-enriched LangChain JSON (graph + inheritance + embed expansion)
matryca-parse export /path/to/logseq/graph output --format langchain-enriched
# 4. Obsidian vault (YAML frontmatter + ^ block ids)
matryca-parse export /path/to/logseq/graph output --format obsidian
Python API
from logseq_matryca_parser.logos_parser import LogosParser
from logseq_matryca_parser.synapse import SynapseAdapter
# Parse to AST
page = LogosParser().parse_page_file("page.md")
# Export to LangChain with lineage metadata
docs = SynapseAdapter.to_langchain_documents(page.root_nodes, source_name=page.title)
🤖 Agentic Write Access (Append-Only)
Agents such as Hermes or OpenClaw can record structured notes into a Logseq graph without rewriting existing pages. The helper logseq_agent_write only opens the weekly agent page in append mode ("a"), writes a new bullet (journal link + optional tag links + body), and never truncates or replaces prior content—so routine logging cannot wipe blocks that already live in that file.
Point it at your graph’s pages directory and config.edn so journal titles match Logseq’s :journal/page-title-format (including ordinal days when you use do in the pattern).
from logseq_matryca_parser import logseq_agent_write
result = logseq_agent_write(
"Summarized user intent and proposed next steps.",
config_path="/path/to/logseq/config.edn",
pages_dir="/path/to/logseq/pages",
context_tags=["agent/hermes", "#session"],
)
assert result["status"] == "success"
# result["path"] → e.g. .../pages/2026-18-agent.md
🗺️ Roadmap
- Desktop GUI: Standalone app for non-technical users. (Join the RFC)
- Obsidian Adapter: Native CLI export (
--format obsidian) with YAML frontmatter and^block anchors. - Ollama Integration: One-click local RAG setup.
☕ Support & Enterprise
Logseq Matryca Parser is open-source. If it powers your pipeline, consider a star ⭐ or a sponsorship!
Need custom RAG integrations or consulting? Contact: marco@marcoporcellato.it
Architected by Marco Porcellato | Powered by Matryca.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logseq_matryca_parser-0.3.3.tar.gz.
File metadata
- Download URL: logseq_matryca_parser-0.3.3.tar.gz
- Upload date:
- Size: 495.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e2e952c4beab272a5d273f5eff3fdd5b1f530dc3e64940ecbbc7c342cafad83
|
|
| MD5 |
2eb2cfc004e5834fe677beb3fcb4e59a
|
|
| BLAKE2b-256 |
8a780ace9faf1509f9fee41d81078efd40e8090db212aab345513edfa9a6027a
|
Provenance
The following attestation bundles were made for logseq_matryca_parser-0.3.3.tar.gz:
Publisher:
pypi_publish.yml on MarcoPorcellato/logseq-matryca-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logseq_matryca_parser-0.3.3.tar.gz -
Subject digest:
0e2e952c4beab272a5d273f5eff3fdd5b1f530dc3e64940ecbbc7c342cafad83 - Sigstore transparency entry: 1574102802
- Sigstore integration time:
-
Permalink:
MarcoPorcellato/logseq-matryca-parser@c92e9cb5838b7a21922ce996aeaaa2a4fb802ea4 -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/MarcoPorcellato
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@c92e9cb5838b7a21922ce996aeaaa2a4fb802ea4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file logseq_matryca_parser-0.3.3-py3-none-any.whl.
File metadata
- Download URL: logseq_matryca_parser-0.3.3-py3-none-any.whl
- Upload date:
- Size: 53.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dc0487d8891b448a1147d8b128d3aabaf8f1216e98a64a2dbfc3e89faf8f5bc
|
|
| MD5 |
6c0b2fa1c344cacdf7fce6dfbe2df880
|
|
| BLAKE2b-256 |
bf9d3ba0cf69a100606ee77a309521628f513d920a6b241b77b8e1a9a8c2dbdc
|
Provenance
The following attestation bundles were made for logseq_matryca_parser-0.3.3-py3-none-any.whl:
Publisher:
pypi_publish.yml on MarcoPorcellato/logseq-matryca-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logseq_matryca_parser-0.3.3-py3-none-any.whl -
Subject digest:
5dc0487d8891b448a1147d8b128d3aabaf8f1216e98a64a2dbfc3e89faf8f5bc - Sigstore transparency entry: 1574102811
- Sigstore integration time:
-
Permalink:
MarcoPorcellato/logseq-matryca-parser@c92e9cb5838b7a21922ce996aeaaa2a4fb802ea4 -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/MarcoPorcellato
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@c92e9cb5838b7a21922ce996aeaaa2a4fb802ea4 -
Trigger Event:
push
-
Statement type: