Skip to main content

A Python tool for embedding files, code snippets, and generating tables of contents in Markdown documents with built-in safety limits and validation

Project description

EmbedM

version 1.0.1

A Markdown compiler driven by source files.

Project Background

EmbedM is part of an exploration into how far AI-assisted development can go when building a non-trivial tool that could be used in a production CD/CI chain. This project has been built based on a human defined architecture, functional spec and a series of interface contracts, then implemented using using Claude and to a lesser extent Google Gemini.

How It Works

EmbedM compiles Markdown documents from directive blocks. Each directive references a source — a code file, a data query, a CSV table, or another document — and is replaced with the extracted, formatted content on compile. Change the source; recompile; the document is current.

Use Cases

Keeping code documentation in sync

Embed a function directly from the source file, scoped by a named region or by symbol name. When the implementation changes the docs regenerate on the next compile — no copy-paste, no drift. Instead of copying the function code, you simply add a reference to the class/function/method/enum or struct.

Instead of adding code that may go out of date:

public void createUser(string user) {
    // ...
}

You create a link to said method, which will be replaced with the up-to-date function at compile time, or give a clear error in case the method 'createUser' is no longer there.

type: file
source: src/api/handlers.java
symbol: UserHandler.createUser
title: "POST /users"
link: true

Live metadata in a README or changelog

Pull version numbers, project names, and other values from pyproject.toml, package.json, or any JSON/YAML/TOML/XML file. The version at the top of this page is a live example — it is compiled from pyproject.toml at build time. Instead of a hard coded version, create a reference to the project. Eg:

type: query-path
source: pyproject.toml
path: project.version
format: "Released: **v{value}**"

Data tables without copy-paste

Embed CSV, TSV data or structured json as formatted Markdown tables. Apply column selection, filtering, and sorting inline — the source file is the single source of truth via:

type: table
source: reports/q4-summary.csv
select: "Region as Region, Revenue as Revenue_USD"
order_by: "Revenue_USD desc"
limit: 10

CI drift detection

Use --verify in your pipeline to catch documentation that has fallen behind its sources. Exit code 1 if any compiled file is stale.

embedm ./docs/src --verify -d ./docs/compiled

AI agent context documents

Use recall to query a large document — a devlog, a decision log, an ADR set — and extract the sentences most relevant to a given topic. Compose multiple queries into a single compiled context file that an AI assistant reads at session start.

type: recall
source: ./devlog.md
query: "validation transform boundary error handling"
max_sentences: 5

EmbedM itself uses this: its agent context file is compiled from the project devlog using four targeted recall queries — plugin conventions, architectural rules, common mistakes, and the active spec. The context window stays focused without manual curation.

Directives

Directives are fenced YAML blocks tagged ```yaml embedm. On compile, each is replaced in-place with the extracted content:

type: file
source: src/config/defaults.py
region: connection_defaults
# connection_defaults
HOST = "localhost"
PORT = 5432
TIMEOUT = 30
POOL_SIZE = 10

Structured data queries render inline:

type: query-path
source: config/app.yaml
path: database.pool_size
format: "Default pool size: **{value}**"

Default pool size: 10

Quick Start

Install

pip install embedm

Or from source:

git clone https://github.com/Fultslop/embedm.git
cd embedm
pip install -e .

Compile a single file

embedm content.md -o compiled/content.md

Compile a directory

embedm ./docs/src -d ./docs/compiled

Preview without writing

embedm content.md -n

Check that compiled files are up to date

embedm ./docs/src --verify -d ./docs/compiled

Generate a default config file

embedm --init

Creating new plugins

See the plugin_tutorial

Features

File embedding

  • Embed entire files, line ranges (5..10), or named regions (md.start:name / md.end:name)
  • Markdown sources are merged inline; all other types are wrapped in a fenced code block
  • Optional title, source link, and line-number annotation

Symbol extraction

  • Extract classes and methods by name from C/C++, C#, and Java source files
  • Dot-notation for nested symbols: OuterClass.InnerClass.methodName
  • Overload disambiguation: add(int, int) vs add(int, int, int)

Structured data

  • Query any value from JSON, YAML, TOML, or XML using dot-notation paths
  • Scalars render inline; dicts and lists render as YAML code blocks
  • Format strings for inline interpolation: "version {value}"

Data tables

  • Render CSV and TSV files as Markdown tables
  • Column selection, row filtering (exact match and comparison operators), sorting, pagination

Table of contents

  • Auto-generated from document headings, including headings in embedded files
  • GitHub-compatible anchor links

AI context

  • synopsis — generate a condensed summary of a document
  • recall — build structured retrieval blocks for AI agent context files

Recursive embedding

  • Markdown files that embed other Markdown files, up to a configurable depth

Safety

  • Configurable limits on file size, memory, recursion depth, and embed output size
  • --verify mode for CI drift detection

Documentation

Document Description
CLI Reference All flags, input modes, and exit codes
Configuration Reference embedm-config.yaml properties and defaults
File Plugin File embedding, regions, lines, symbol extraction
Query-Path Plugin Structured data extraction from JSON/YAML/TOML/XML
Table Plugin CSV/TSV tables with filtering and sorting
Toc Plugin Table-of-contents generation
Architecture System design, plugin model, plan/compile pipeline

License

MIT License — see LICENSE file for details.

Contributing

Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedm-1.0.1.tar.gz (68.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedm-1.0.1-py3-none-any.whl (92.6 kB view details)

Uploaded Python 3

File details

Details for the file embedm-1.0.1.tar.gz.

File metadata

  • Download URL: embedm-1.0.1.tar.gz
  • Upload date:
  • Size: 68.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embedm-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8881ee75f2dfd6910a13b2a2a98aea43dd357b6a424055d383f8294ecd08325e
MD5 e225fbb40c21bc2a58e06469fc443aef
BLAKE2b-256 043760d9488bf3173e51fa0dac423812a7a50d7b731b4a20d91364834f317214

See more details on using hashes here.

Provenance

The following attestation bundles were made for embedm-1.0.1.tar.gz:

Publisher: publish_pypi.yaml on Fultslop/embedm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file embedm-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: embedm-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 92.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embedm-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a0d07973ddf7a18010e89fd89d09df4f26bf884c4ff87cdc87d2a08b75783e7a
MD5 a8fb343ee4325b71215e5cbad945b4ac
BLAKE2b-256 7750074842ac87e8ec60a11e7b0df23e034512bb44b72e9f7871e31c4bb9951c

See more details on using hashes here.

Provenance

The following attestation bundles were made for embedm-1.0.1-py3-none-any.whl:

Publisher: publish_pypi.yaml on Fultslop/embedm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page