Skip to main content

A local MCP server for retrieving paragraphs from one Jekyll-style blog.

Project description

blograg

blograg is a local MCP-oriented retrieval tool for one Jekyll-style blog. It uses labelrag as the retrieval core and treats heading-delimited markdown sections as the paragraph unit. It is designed to:

  • build a paragraph index from one Jekyll-style repository
  • serve that index over MCP Streamable HTTP
  • inspect service state from the CLI and a lightweight browser page
  • register the HTTP endpoint with local MCP clients such as Codex or OpenClaw

Detailed command reference lives in docs/commands.md.

Installation

Recommended for most users:

pipx install blograg

If you prefer pip:

python -m pip install blograg

If you use Homebrew:

brew install HuRuilizhen/tap/blograg

Quick Start

Initialize local defaults and optional provider secrets:

blograg config wizard

Build an index:

blograg build --blog-dir /path/to/blog --index-dir /path/to/index

Start the managed HTTP service:

blograg start --index-dir /path/to/index

Inspect service state:

blograg status
blograg logs --follow
blograg doctor

Open the browser status page:

http://127.0.0.1:8765/

Register the MCP endpoint with a client:

blograg register --client codex
blograg register --show

Core Commands

Most day-to-day usage is centered on:

  • blograg config wizard
  • blograg build
  • blograg serve
  • blograg start
  • blograg status
  • blograg logs
  • blograg doctor
  • blograg register

For command-by-command examples and option summaries, see docs/commands.md.

Persistent Config

blograg stores user-level config and secrets in:

  • config.toml
  • secrets.toml

Default locations:

  • macOS/Linux: ~/.config/blograg/
  • Windows: %AppData%/blograg/

Useful commands:

blograg config path
blograg config show
blograg config show --all
blograg config set default_index_dir /path/to/index
blograg config set retrieval.retrieval_strategy label_gate_semantic_rank
blograg config set-secret mistral --api-key your-key-here

config show masks secret values and only reports whether each provider key is configured.

MCP Service Model

blograg serve loads an existing index and starts the MCP server. It does not rebuild automatically. If the index is missing or incomplete, run build first.

The default transport is Streamable HTTP. The default HTTP binding is:

  • host: 127.0.0.1
  • port: 8765

If you need LAN access, bind explicitly:

blograg serve --host 0.0.0.0 --port 8765

Current HTTP endpoints:

  • /mcp
  • /
  • /healthz

The browser page at / is a lightweight status page, not a separate web app.

MCP Client Registration

Register the local endpoint with one client at a time:

blograg register --client codex
blograg register --client openclaw

Inspect current registration state:

blograg register --show
blograg register --show --server-name blograg-local

You can also register an explicit URL:

blograg register \
  --client codex \
  --server-name blograg-local \
  --url http://127.0.0.1:8765/mcp

LLM Usage

blograg build supports the upstream extraction modes:

  • heuristic
  • spacy
  • llm

Example LLM build:

MISTRAL_API_KEY=your-key-here \
blograg build \
  --blog-dir /path/to/blog \
  --index-dir /path/to/index \
  --concept-extractor llm \
  --llm-provider mistral \
  --llm-model mistral-small

If an index was built with --concept-extractor llm, query analysis at serve time still needs access to the corresponding provider API key. You can provide it through:

  • blograg config set-secret ...
  • environment variables such as MISTRAL_API_KEY

Retrieval Output

The server currently exposes one tool:

retrieve_paragraphs(query: str, top_k: int = 5)

Each result includes:

  • paragraph_id
  • text
  • post_title
  • slug
  • section_heading
  • trace.retrieval_strategy
  • trace.score
  • trace.score_kind

Index Layout

blograg build writes an outer blograg directory inside the chosen index root:

/path/to/index/
  blograg/
    manifest.json
    paragraphs.json
    labelrag/
      ...

The outer layer stores blograg-specific metadata and paragraph source metadata. The inner labelrag directory is a normal persisted upstream snapshot.

Runtime Notes

  • The default build mode is heuristic, so the default path does not require a spaCy model download.
  • The default embedding provider still comes from upstream labelrag, so the first real build or query may download the configured embedding model.
  • Advanced retrieval runtime settings live under persisted retrieval.* config keys and can also be overridden through serve and start.

Development Checks

pytest
ruff check .
ruff format --check .
pyright
python -m build
twine check dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blograg-0.0.2.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blograg-0.0.2-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file blograg-0.0.2.tar.gz.

File metadata

  • Download URL: blograg-0.0.2.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for blograg-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a536210710729c9f7b340301f9033aef78ff619d2ce41e386de7fdeab692221c
MD5 bd36cfbf71c43b52210b9cae36d30d51
BLAKE2b-256 6dda7884b7c9358032a72d52c71d06cf122db8846bccf9320b6fe1a14ab02008

See more details on using hashes here.

Provenance

The following attestation bundles were made for blograg-0.0.2.tar.gz:

Publisher: publish.yml on HuRuilizhen/blograg-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file blograg-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: blograg-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for blograg-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c77cc1ca37ad7341bc84054792c17aad876579b90f8247719b2c32d7ffe810c8
MD5 efc75316019b8b527e936cd1e13d2fb0
BLAKE2b-256 7172370cb2cd1edecb5fc1c271d0b79c818ca2167ebf3991969dea53502be302

See more details on using hashes here.

Provenance

The following attestation bundles were made for blograg-0.0.2-py3-none-any.whl:

Publisher: publish.yml on HuRuilizhen/blograg-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page