A local MCP server for retrieving paragraphs from one Jekyll-style blog.
Project description
blograg
blograg is a local MCP-oriented retrieval tool for one Jekyll-style blog.
It uses labelrag as the retrieval
core and treats heading-delimited markdown sections as the paragraph unit.
It is designed for local, single-blog usage:
- build a paragraph index from one Jekyll-style repository
- serve that index over MCP Streamable HTTP
- inspect service state from the CLI and a lightweight browser page
- register the HTTP endpoint with local MCP clients such as Codex or OpenClaw
Detailed command reference lives in
docs/commands.md.
Scope
Current 0.0.0 scope:
- one local blog directory
- Jekyll-style front matter parsing
- heading-delimited paragraph segmentation
- full rebuild only
- one MCP tool:
retrieve_paragraphs
Out of scope:
- incremental indexing
- multiple blog roots
- runtime rebuilds from the MCP server
- alternate storage backends
- broad MCP tool surfaces beyond paragraph retrieval
Installation
python3.11 -m venv .venv
. .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e '.[dev]'
Quick Start
Initialize local defaults and optional provider secrets:
blograg config wizard
Build an index:
blograg build --blog-dir /path/to/blog --index-dir /path/to/index
Start the managed HTTP service:
blograg start --index-dir /path/to/index
Inspect service state:
blograg status
blograg logs --follow
blograg doctor
Open the browser status page:
http://127.0.0.1:8765/
Register the MCP endpoint with a client:
blograg register --client codex
blograg register --show
Core Commands
Most day-to-day usage is centered on:
blograg config wizardblograg buildblograg serveblograg startblograg statusblograg logsblograg doctorblograg register
For command-by-command examples and option summaries, see
docs/commands.md.
Persistent Config
blograg stores user-level config and secrets in:
config.tomlsecrets.toml
Default locations:
- macOS/Linux:
~/.config/blograg/ - Windows:
%AppData%/blograg/
Useful commands:
blograg config path
blograg config show
blograg config show --all
blograg config set default_index_dir /path/to/index
blograg config set retrieval.retrieval_strategy label_gate_semantic_rank
blograg config set-secret mistral --api-key your-key-here
config show masks secret values and only reports whether each provider key is
configured.
MCP Service Model
blograg serve loads an existing index and starts the MCP server. It does not
rebuild automatically. If the index is missing or incomplete, run build
first.
The default transport is Streamable HTTP. The default HTTP binding is:
- host:
127.0.0.1 - port:
8765
If you need LAN access, bind explicitly:
blograg serve --host 0.0.0.0 --port 8765
Current HTTP endpoints:
/mcp//healthz
The browser page at / is a lightweight status page, not a separate web app.
MCP Client Registration
Register the local endpoint with one client at a time:
blograg register --client codex
blograg register --client openclaw
Inspect current registration state:
blograg register --show
blograg register --show --server-name blograg-local
You can also register an explicit URL:
blograg register \
--client codex \
--server-name blograg-local \
--url http://127.0.0.1:8765/mcp
LLM Usage
blograg build supports the upstream extraction modes:
heuristicspacyllm
Example LLM build:
MISTRAL_API_KEY=your-key-here \
blograg build \
--blog-dir /path/to/blog \
--index-dir /path/to/index \
--concept-extractor llm \
--llm-provider mistral \
--llm-model mistral-small
If an index was built with --concept-extractor llm, query analysis at serve
time still needs access to the corresponding provider API key. You can provide
it through:
blograg config set-secret ...- environment variables such as
MISTRAL_API_KEY
Retrieval Output
The server currently exposes one tool:
retrieve_paragraphs(query: str, top_k: int = 5)
Each result includes:
paragraph_idtextpost_titleslugsection_headingtrace.retrieval_strategytrace.scoretrace.score_kind
Index Layout
blograg build writes an outer blograg directory inside the chosen index
root:
/path/to/index/
blograg/
manifest.json
paragraphs.json
labelrag/
...
The outer layer stores blograg-specific metadata and paragraph source
metadata. The inner labelrag directory is a normal persisted upstream
snapshot.
Runtime Notes
- The default build mode is
heuristic, so the default path does not require a spaCy model download. - The default embedding provider still comes from upstream
labelrag, so the first real build or query may download the configured embedding model. - Advanced retrieval runtime settings live under persisted
retrieval.*config keys and can also be overridden throughserveandstart.
Development Checks
pytest
ruff check .
ruff format --check .
pyright
python -m build
twine check dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blograg-0.0.1.tar.gz.
File metadata
- Download URL: blograg-0.0.1.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af4ffac42501f6a268d9077f89631f89592cb2c1763ef2c57f866d2370267719
|
|
| MD5 |
ea69e25072c2f806403c9f7ff74adf37
|
|
| BLAKE2b-256 |
8eb7bd6157306ebeb3ceba8d7e16daa9c00efdbfdeec4cb30a04ce7acce351b2
|
Provenance
The following attestation bundles were made for blograg-0.0.1.tar.gz:
Publisher:
publish.yml on HuRuilizhen/blograg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
blograg-0.0.1.tar.gz -
Subject digest:
af4ffac42501f6a268d9077f89631f89592cb2c1763ef2c57f866d2370267719 - Sigstore transparency entry: 1510093237
- Sigstore integration time:
-
Permalink:
HuRuilizhen/blograg@7429473c8adf281d888e33178a1f4420fd859d03 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/HuRuilizhen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7429473c8adf281d888e33178a1f4420fd859d03 -
Trigger Event:
push
-
Statement type:
File details
Details for the file blograg-0.0.1-py3-none-any.whl.
File metadata
- Download URL: blograg-0.0.1-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
642e1735facc59d5ad03a0e02c357a3550f74e2374bc32223428d837d8c2ac17
|
|
| MD5 |
88703e3f8d0f009ce48a3bd556027b22
|
|
| BLAKE2b-256 |
1a4790077a6adf3ca449f95c18bcdb3b6e1842e2121a4c3d98f4f3d1ef6bf462
|
Provenance
The following attestation bundles were made for blograg-0.0.1-py3-none-any.whl:
Publisher:
publish.yml on HuRuilizhen/blograg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
blograg-0.0.1-py3-none-any.whl -
Subject digest:
642e1735facc59d5ad03a0e02c357a3550f74e2374bc32223428d837d8c2ac17 - Sigstore transparency entry: 1510093451
- Sigstore integration time:
-
Permalink:
HuRuilizhen/blograg@7429473c8adf281d888e33178a1f4420fd859d03 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/HuRuilizhen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7429473c8adf281d888e33178a1f4420fd859d03 -
Trigger Event:
push
-
Statement type: