A Python tool for embedding files, code snippets, and generating tables of contents in Markdown documents with built-in safety limits and validation
Project description
EmbedM
version 1.0.0
A Markdown compiler driven by source files.
- Project Background
- How It Works
- Use Cases
- Directives
- Quick Start
- Features
- Documentation
- License
- Contributing
Project Background
EmbedM is part of an exploration into how far AI-assisted development can go when building a non-trivial tool that could be used in a production CD/CI chain. This project has been built based on a human defined architecture, functional spec and a series of interface contracts, then implemented using using Claude and to a lesser extent Google Gemini.
How It Works
EmbedM compiles Markdown documents from directive blocks. Each directive references a source — a code file, a data query, a CSV table, or another document — and is replaced with the extracted, formatted content on compile. Change the source; recompile; the document is current.
Use Cases
Keeping code documentation in sync
Embed a function directly from the source file, scoped by a named region or by symbol name. When the implementation changes the docs regenerate on the next compile — no copy-paste, no drift. Instead of copying the function code, you simply add a reference to the class/function/method/enum or struct.
Instead of adding code that may go out of date:
public void createUser(string user) {
// ...
}
You create a link to said method, which will be replaced with the up-to-date function at compile time, or give a clear error in case the method 'createUser' is no longer there.
type: file
source: src/api/handlers.java
symbol: UserHandler.createUser
title: "POST /users"
link: true
Live metadata in a README or changelog
Pull version numbers, project names, and other values from pyproject.toml, package.json, or any JSON/YAML/TOML/XML file. The version at the top of this page is a live example — it is compiled from pyproject.toml at build time. Instead of a hard coded version, create a reference to the project. Eg:
type: query-path
source: pyproject.toml
path: project.version
format: "Released: **v{value}**"
Data tables without copy-paste
Embed CSV, TSV data or structured json as formatted Markdown tables. Apply column selection, filtering, and sorting inline — the source file is the single source of truth via:
type: table
source: reports/q4-summary.csv
select: "Region as Region, Revenue as Revenue_USD"
order_by: "Revenue_USD desc"
limit: 10
CI drift detection
Use --verify in your pipeline to catch documentation that has fallen behind its sources. Exit code 1 if any compiled file is stale.
embedm ./docs/src --verify -d ./docs/compiled
AI agent context documents
Use recall to query a large document — a devlog, a decision log, an ADR set — and extract the sentences most relevant to a given topic. Compose multiple queries into a single compiled context file that an AI assistant reads at session start.
type: recall
source: ./devlog.md
query: "validation transform boundary error handling"
max_sentences: 5
EmbedM itself uses this: its agent context file is compiled from the project devlog using four targeted recall queries — plugin conventions, architectural rules, common mistakes, and the active spec. The context window stays focused without manual curation.
Directives
Directives are fenced YAML blocks tagged ```yaml embedm. On compile, each is replaced in-place with the extracted content:
type: file
source: src/config/defaults.py
region: connection_defaults
# connection_defaults
HOST = "localhost"
PORT = 5432
TIMEOUT = 30
POOL_SIZE = 10
Structured data queries render inline:
type: query-path
source: config/app.yaml
path: database.pool_size
format: "Default pool size: **{value}**"
Default pool size: 10
Quick Start
Install
pip install embedm
Or from source:
git clone https://github.com/Fultslop/embedm.git
cd embedm
pip install -e .
Compile a single file
embedm content.md -o compiled/content.md
Compile a directory
embedm ./docs/src -d ./docs/compiled
Preview without writing
embedm content.md -n
Check that compiled files are up to date
embedm ./docs/src --verify -d ./docs/compiled
Generate a default config file
embedm --init
Creating new plugins
See the plugin_tutorial
Features
File embedding
- Embed entire files, line ranges (
5..10), or named regions (md.start:name/md.end:name) - Markdown sources are merged inline; all other types are wrapped in a fenced code block
- Optional title, source link, and line-number annotation
Symbol extraction
- Extract classes and methods by name from C/C++, C#, and Java source files
- Dot-notation for nested symbols:
OuterClass.InnerClass.methodName - Overload disambiguation:
add(int, int)vsadd(int, int, int)
Structured data
- Query any value from JSON, YAML, TOML, or XML using dot-notation paths
- Scalars render inline; dicts and lists render as YAML code blocks
- Format strings for inline interpolation:
"version {value}"
Data tables
- Render CSV and TSV files as Markdown tables
- Column selection, row filtering (exact match and comparison operators), sorting, pagination
Table of contents
- Auto-generated from document headings, including headings in embedded files
- GitHub-compatible anchor links
AI context
synopsis— generate a condensed summary of a documentrecall— build structured retrieval blocks for AI agent context files
Recursive embedding
- Markdown files that embed other Markdown files, up to a configurable depth
Safety
- Configurable limits on file size, memory, recursion depth, and embed output size
--verifymode for CI drift detection
Documentation
| Document | Description |
|---|---|
| CLI Reference | All flags, input modes, and exit codes |
| Configuration Reference | embedm-config.yaml properties and defaults |
| File Plugin | File embedding, regions, lines, symbol extraction |
| Query-Path Plugin | Structured data extraction from JSON/YAML/TOML/XML |
| Table Plugin | CSV/TSV tables with filtering and sorting |
| Toc Plugin | Table-of-contents generation |
| Architecture | System design, plugin model, plan/compile pipeline |
License
MIT License — see LICENSE file for details.
Contributing
Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedm-1.0.0.tar.gz.
File metadata
- Download URL: embedm-1.0.0.tar.gz
- Upload date:
- Size: 59.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba8e4b7858bc1ad39e008174f87e319cb5ac2727a27129c77806204bbfde6b8a
|
|
| MD5 |
c2b67059abf1e296a53c4e9753a48b36
|
|
| BLAKE2b-256 |
446c4e3b96a94998fdd510351b59c8962614209713c3c0dc8d6cdeb13817c53a
|
Provenance
The following attestation bundles were made for embedm-1.0.0.tar.gz:
Publisher:
publish_pypi.yaml on Fultslop/embedm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embedm-1.0.0.tar.gz -
Subject digest:
ba8e4b7858bc1ad39e008174f87e319cb5ac2727a27129c77806204bbfde6b8a - Sigstore transparency entry: 997281604
- Sigstore integration time:
-
Permalink:
Fultslop/embedm@ad93415a9cf04237b30bac6f71a7f66a778a405f -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/Fultslop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yaml@ad93415a9cf04237b30bac6f71a7f66a778a405f -
Trigger Event:
release
-
Statement type:
File details
Details for the file embedm-1.0.0-py3-none-any.whl.
File metadata
- Download URL: embedm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 80.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94fab7461a34bd0088210a820c42ace2bb46348b16370621b979fbe7f6790316
|
|
| MD5 |
4ec530c024ad3098f59b41e66e1dd34f
|
|
| BLAKE2b-256 |
727adc1545c3b71cfec0640587ce333a1d9264e591ced1477f6bc4172a0a1722
|
Provenance
The following attestation bundles were made for embedm-1.0.0-py3-none-any.whl:
Publisher:
publish_pypi.yaml on Fultslop/embedm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embedm-1.0.0-py3-none-any.whl -
Subject digest:
94fab7461a34bd0088210a820c42ace2bb46348b16370621b979fbe7f6790316 - Sigstore transparency entry: 997281608
- Sigstore integration time:
-
Permalink:
Fultslop/embedm@ad93415a9cf04237b30bac6f71a7f66a778a405f -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/Fultslop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yaml@ad93415a9cf04237b30bac6f71a7f66a778a405f -
Trigger Event:
release
-
Statement type: