Skip to main content

Markdown CORE AI - Classification, Organisation, Retrieval & Entry for your personal markdown knowledge base

Project description

mdcore - Markdown CORE AI

Classification, Organisation, Retrieval & Entry for your personal markdown knowledge base

Version: 1.0.0 | PyPI: markdowncore-ai | CLI: mdcore


What It Does

mdcore is a local, LLM-agnostic knowledge base engine for anyone with a folder of markdown notes. It does two things:

Flow A - Retrieval: Given a topic, it retrieves relevant chunks from your vault, synthesises them into a coherent cited briefing, and writes the result to <vault>/mdcore-output/. Copy and paste into any LLM conversation as context. Zero calls to your subscription LLM.

Flow B - Ingestion: Given any document - an LLM session summary, a research note, an article, a strategy doc, or any new piece of knowledge - it classifies the content against your existing vault, decides whether to update an existing note or create a new one, routes it to the right folder, detects conflicts, generates a proposal, and only after your explicit approval writes the changes and reindexes automatically.


Why mdcore

mdcore Smart Connections Khoj Reor
Semantic search Yes Yes Yes Yes
Local-first (Ollama) Yes Yes Yes Yes
CLI - no GUI required Yes No No No
Write-back ingestion pipeline Yes No No No
Conflict detection Yes No No No
No always-on server Yes N/A No N/A
Any markdown folder Yes Obsidian only Yes Yes

The entire field is read-only RAG with a GUI. mdcore is the only tool that reads and writes your knowledge base from the terminal with user-approved writes.


Documentation


Installation

Any platform (uv - recommended)

uv tool install markdowncore-ai

Any platform (pipx)

pipx install markdowncore-ai

Ollama models (for local inference)

ollama pull nomic-embed-text   # embeddings
ollama pull qwen3.5:4b         # primary LLM - classification + proposals
ollama pull phi4-mini          # synthesis - fast, non-thinking

After install

mdcore init           # interactive setup wizard
mdcore deps install   # install any backend packages not yet present
mdcore index          # index your vault

Commands

mdcore init                           # Interactive setup wizard - create config
mdcore index                          # Scan vault, show diff, confirm, index delta
mdcore index --force                  # Wipe and reindex from scratch
mdcore search <topic>                 # Synthesise briefing -> write to <vault>/mdcore-output/ (Flow A)
mdcore search <topic> --raw           # Raw excerpts only - skip synthesis
mdcore search <topic> --verbose       # Show chunk scores alongside results
mdcore ingest                         # Paste any document - classify, route, propose (Flow B)
mdcore ingest --file doc.md           # Ingest from a file (session summary, article, notes, etc.)
mdcore map                            # Generate vault folder map for doc routing
mdcore map --repair                   # Remove stale folder descriptions from map
mdcore status                         # Show index health and drift warnings
mdcore eval [topic]                   # Run quality evaluation checklist
mdcore config                         # Open config file in editor
mdcore config --validate              # Validate config and report errors

Multiple config profiles

mdcore search "istio auth" --config ~/.mdcore/config-technical.yaml
mdcore search "career goals" --config ~/.mdcore/config-personal.yaml

Quick Start

# 1. Configure (interactive wizard)
mdcore init
# -> asks for vault path, owner name, LLM backend, models
# -> detects Ollama + pulled models, gives hardware-appropriate suggestions
# -> writes ~/.mdcore/config.yaml

# 2. Index your vault
mdcore index

# 3. Retrieve context for an LLM conversation
mdcore search "kubernetes ingress routing"
# -> writes <vault>/mdcore-output/2026-04-26-kubernetes-ingress-routing.md
# -> open file, copy contents -> paste into Claude/ChatGPT/Gemini

# 4. Ingest any document into your vault
mdcore ingest --file my-session-summary.md   # LLM session summary
mdcore ingest --file oss-strategy.md         # standalone research doc
mdcore ingest                                # paste content directly
# -> mdcore classifies, routes to right folder, proposes changes -> approve

Architecture

YOUR MARKDOWN VAULT (any folder of .md files)
        |
        v
   mdcore core
   +----------+  +----------+  +------------+
   | Indexer  |  |Retriever |  |  Ingester  |
   +----------+  +----------+  +------------+
   +----------+  +----------+  +------------+
   |  Writer  |  |LLM Layer |  |VectorStore |
   +----------+  +----------+  +------------+
        |
   (copy-paste by user)
        |
        v
ANY SUBSCRIPTION LLM (Claude / ChatGPT / Gemini / Others)

mdcore never talks to your subscription LLM directly. It prepares context (Flow A) and processes output from it (Flow B). The user is always the bridge.


Where LLM Calls Happen

Every call goes to your configured llm.model (or synthesise_model where noted). Token usage logged at INFO level to ~/.mdcore/logs/mdcore.log after every call.

Flow A - mdcore search <topic>

Phase LLM call? Model used Notes
Keyword pre-filter No - BM25 scoring, no LLM
Vector search No - Embedding lookup only
Chunk stitching + formatting No - Pure text assembly
Synthesis Yes synthesise_model Reformats raw excerpts into a briefing. Skip with --raw

mdcore search <topic> --raw makes Flow A fully LLM-free.

Flow B - mdcore ingest

Phase LLM call? Model used Condition
Embedding + vector search No - Always
Classification Conditional llm.model Only when similarity score is between similarity_threshold_low and similarity_threshold_high
Folder routing Yes (NEW only) llm.model When action=NEW, LLM picks target folder from semantic candidate list
Proposal generation Yes llm.model Always - generates human-readable summary before approval

mdcore map / mdcore index

No LLM calls.


Observability

Token usage logged after every call:

INFO llm - tokens [gemini-2.5-flash-lite] in=312 out=89 total=401

Optional LangSmith tracing - add to ~/.mdcore/config.yaml:

llm:
  langsmith_api_key: <your-key>
  langsmith_project: mdcore

Traces every LLM call at smith.langchain.com with full prompt, response, latency, and token counts.


Configuration Reference

See config.yaml.example for the full annotated config. Key sections:

Section Key fields Purpose
vault path, owner_name Vault root path, owner identity for multi-person vaults
indexer chunk_size, heading_levels Chunking strategy and quality filters
embeddings backend, local_model Local (Ollama) or API-backed embeddings
retriever top_k, similarity_threshold Candidate retrieval, assembly, signposting
ingester similarity_threshold_high/low Classification thresholds, conflict detection
writer append_position, backup Append position, frontmatter injection, backups
llm model, synthesise_model Primary LLM (classify/propose) + synthesis model (search)
cli theme, verbose Terminal UI behaviour

Hardware Tiers

Hardware LLM Model Embedding Model
Apple M2 Air 16GB qwen3.5:4b nomic-embed-text
i5 + RTX 4070 qwen3:8b bge-m3
Low-end / no GPU gpt-4o-mini / claude-haiku-4-5 text-embedding-3-small

Project Structure

mdcore/
+-- cli/commands.py              # Typer commands, Rich rendering
+-- core/
|   +-- indexer/                 # VaultScanner, ManifestManager, TextSplitter, ...
|   +-- retriever/               # KeywordPreFilter, VectorSearcher, ChunkStitcher, ...
|   +-- ingester/                # ClassificationEngine, ConflictDetector, FolderRouter, ...
|   +-- writer/                  # BackupManager, FrontmatterInjector, FileWriter, ...
+-- llm/llm_layer.py             # classify(), propose(), synthesise(), route_folder()
+-- store/vector_store.py        # ChromaDB wrapper
+-- config/                      # Pydantic models + YAML loader
+-- utils/                       # Logging, file utilities

What mdcore Is Not

  • Not a chatbot or RAG question-answering agent
  • Not an API wrapper around subscription LLMs
  • Not a note-taking application
  • Not an always-on background service
  • Never writes anything without your explicit approval

mdcore - Markdown CORE AI v1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdowncore_ai-1.0.0.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdowncore_ai-1.0.0-py3-none-any.whl (84.2 kB view details)

Uploaded Python 3

File details

Details for the file markdowncore_ai-1.0.0.tar.gz.

File metadata

  • Download URL: markdowncore_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for markdowncore_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ae9fed761b0612e8fc4fb84a811dada5c8506714b66f1401ae031e6d8d5f274a
MD5 86ebdae2cbb839b2320432d8c7be4e3e
BLAKE2b-256 92683c2abb4da086493ae5dfbfd542674c8695652b61bfb4cfea3c06f86bbbd6

See more details on using hashes here.

File details

Details for the file markdowncore_ai-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for markdowncore_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f21543502027ab0658f734f26ea90e8fdbdbd3f551b9c87460044fc8e68cadf8
MD5 ec09f5d4c6d467582878330dc3b8fa0a
BLAKE2b-256 81d54f988621c4204cf15f16ed0bb29bab91494fb5499aa7344e79e9232a0096

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page