Skip to main content

Markdown CORE AI - Classification, Organisation, Retrieval & Entry for your personal markdown knowledge base

Project description

mdcore

Markdown CORE AI - Classification, Organisation, Retrieval & Entry

mdcore is a local, LLM-agnostic knowledge base engine for anyone with a folder of markdown notes. It reads and writes your vault intelligently - retrieve context on demand, ingest new knowledge with automatic classification and routing, all from the terminal or a TUI.

PyPI: markdowncore-ai | CLI: mdcore | Version: 1.0.3


Screenshots

mdcore home

mdcore search

mdcore index

mdcore status


What It Does

Retrieval (mdcore search) - Ask a question or give a topic. mdcore searches your vault semantically, stitches the most relevant chunks, and synthesises a coherent cited briefing. Output lands in <vault>/mdcore-output/ - ready to copy into any LLM conversation.

Ingestion (mdcore ingest) - Feed any document into mdcore - an LLM session summary, a research note, a strategy doc, an article. It classifies the content against your existing vault, routes it to the right folder, detects conflicts with existing notes, generates a proposal, and writes only after your explicit approval.

Both flows work fully local with Ollama. No subscription LLM API calls. No always-on server.


Installation

# Recommended
uv tool install markdowncore-ai

# With TUI
uv tool install "markdowncore-ai[gui]"

# pipx
pipx install markdowncore-ai

Ollama models (local inference)

ollama pull nomic-embed-text   # embeddings
ollama pull qwen3.5:4b         # classification, routing, proposals
ollama pull phi4-mini          # synthesis (fast, non-thinking)

First run

mdcore init     # interactive setup -> writes ~/.mdcore/config.yaml
mdcore index    # scan and index your vault

Quick Start

# Search your vault
mdcore search "kubernetes ingress routing"
# -> synthesised briefing written to <vault>/mdcore-output/
# -> copy contents, paste into Claude / ChatGPT / Gemini

# Ingest a document
mdcore ingest --file my-session-summary.md
# -> classifies, routes to right folder, proposes changes -> approve to write

# Launch TUI
mdcore gui

Commands

mdcore init                        # Interactive setup wizard
mdcore index                       # Delta index - scan, diff, confirm, index
mdcore index --force               # Wipe everything and reindex from scratch
mdcore search <topic>              # Retrieve + synthesise briefing (Flow A)
mdcore search <topic> --raw        # Retrieve raw excerpts, skip synthesis
mdcore search <topic> --verbose    # Show similarity scores
mdcore ingest                      # Paste document - classify, route, propose (Flow B)
mdcore ingest --file <path>        # Ingest from file
mdcore map                         # Generate vault folder map for routing
mdcore map --repair                # Remove stale folder entries
mdcore gui                         # Launch TUI (requires [gui] extra)
mdcore status                      # Index health, drift warnings
mdcore eval [topic]                # Retrieval quality checklist
mdcore config                      # Open config in editor
mdcore config --validate           # Validate config

Multiple vaults / config profiles

mdcore search "istio auth"     --config ~/.mdcore/config-work.yaml
mdcore search "career goals"   --config ~/.mdcore/config-personal.yaml
mdcore search "topic"          --models ~/.mdcore/models-aggregator.yaml

Backends

mdcore supports local and API-backed models. Mix and match per use case.

Backend LLM Embeddings Extra needed
Ollama (local) any pulled model nomic-embed-text, bge-m3 none
Gemini gemini-2.5-flash-lite models/gemini-embedding-001 none (bundled)
OpenAI gpt-4o-mini text-embedding-3-small [openai]
Anthropic claude-haiku-4-5 use Ollama or OpenAI [anthropic]
Aggregator free-tier key pool free-tier key pool [aggregator]
uv tool install "markdowncore-ai[openai]"
uv tool install "markdowncore-ai[anthropic]"
uv tool install "markdowncore-ai[all]"    # every backend

Aggregator backend

aggregator routes calls through llm-aggregator - a local SQLite-backed key pool that round-robins free-tier API keys with automatic 429 cooldown. No api_key needed in mdcore config.

Install separately (not on PyPI):

pip install git+https://github.com/piyush-tyagi-13/llm-aggregator
# or if installed via uv tool:
uv tool install markdowncore-ai --with "llm-aggregator @ git+https://github.com/piyush-tyagi-13/llm-aggregator"
llm:
  backend: aggregator
  aggregator_category: general_purpose       # key pool category (optional)
  aggregator_rotate_every: 5      # requests per key before rotation

embeddings:
  backend: aggregator
  aggregator_category: general_purpose

Hardware guidance

Hardware LLM Embeddings
Apple M2 16GB+ qwen3.5:4b nomic-embed-text
i5 + RTX 4070 qwen3:8b bge-m3
Low-end / no GPU gemini-2.5-flash-lite or gpt-4o-mini models/gemini-embedding-001

Configuration

Config lives at ~/.mdcore/config.yaml. Generated by mdcore init.

Section Key fields Purpose
vault path, owner_name Vault root, owner name for multi-person vaults
embeddings backend, api_model / local_model, api_key Embedding model
llm backend, model, api_key, synthesise_model Primary LLM + synthesis model
indexer chunk_size, heading_aware_splitting Chunking strategy
retriever top_k, similarity_threshold Retrieval tuning
ingester similarity_threshold_high/low Classification thresholds
writer append_position, backup Write behaviour + backups

See config.yaml.example for the full annotated reference.

Separate models config

Keep model choices in a separate ~/.mdcore/models.yaml - useful for switching backends without touching main config. Values here override llm and embeddings sections in config.yaml.

# ~/.mdcore/models.yaml
llm:
  backend: aggregator
  aggregator_category: general_purpose

embeddings:
  backend: ollama
  local_model: nomic-embed-text

Pass explicitly with --models:

mdcore search "topic" --models ~/.mdcore/models-work.yaml
mdcore ingest --file note.md --models ~/.mdcore/models-cheap.yaml

Where LLM Calls Happen

mdcore search (Flow A)

Phase LLM? Notes
Keyword pre-filter No BM25 scoring
Vector search No Embedding lookup
Chunk assembly No Pure text
Synthesis Yes - synthesise_model Skip with --raw for zero LLM calls

mdcore ingest (Flow B)

Phase LLM? Condition
Embedding + search No Always
Classification Conditional - llm.model Only in ambiguous similarity range (0.65-0.82)
Folder routing Yes - llm.model NEW files only
Proposal Yes - llm.model Always before write

mdcore map and mdcore index make no LLM calls.


Observability

Token usage logged after every call to ~/.mdcore/logs/:

INFO llm - tokens [gemini-2.5-flash-lite] in=312 out=89 total=401

LangSmith tracing (optional) - add to ~/.mdcore/config.yaml:

llm:
  langsmith_api_key: <your-key>
  langsmith_project: mdcore

mdcore - Markdown CORE AI v1.0.3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdowncore_ai-1.0.3.tar.gz (77.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdowncore_ai-1.0.3-py3-none-any.whl (92.3 kB view details)

Uploaded Python 3

File details

Details for the file markdowncore_ai-1.0.3.tar.gz.

File metadata

  • Download URL: markdowncore_ai-1.0.3.tar.gz
  • Upload date:
  • Size: 77.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for markdowncore_ai-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f5ee153d547961eaf6a85423448e5a4c00809511aba58ca31b01c46109863b7c
MD5 7f4b5f80ae6d028508c933718e21bafc
BLAKE2b-256 fbc4e08cb54e6efaf487a572d8683922a70b34f0bbaa7998262410ab3fa2f3a0

See more details on using hashes here.

File details

Details for the file markdowncore_ai-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for markdowncore_ai-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 569301d5515b3d7734cde9b0640bf950c52f69eb248e4e67b596e0ca57a5d100
MD5 e10fbfdce82768cd332d0e3aef09dd63
BLAKE2b-256 418cc3ca0c5281635ea3d12dd8e9edd4c7d3c4f41b318e59798588fae0240300

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page