Generate AI-agent VOICE.md files from website copy and CTAs.
Project description
site2voice
Generate VOICE.md from any website.
site2voice reads website copy and writes a small Markdown brief that tells an
AI coding agent how the site sounds: headings, CTAs, navigation labels, sentence
shape, repeated vocabulary, and claim boundaries.
pipx install site2voice
site2voice https://example.com --out VOICE.md
From a repo clone, run the included benchmark fixture:
site2voice examples/saas-home.html --format json
site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
Why
DESIGN.md helps agents stop guessing visual style. VOICE.md helps them stop
guessing copy style.
Drop the generated file into a project and tell the agent:
Use @VOICE.md for landing-page copy, headings, CTAs, and UI microcopy.
Output
# VOICE.md
## Voice Summary
- Overall tone: explanatory, action-oriented, trust-forward.
- Sentence shape: about 20.4 words per sentence.
- Main vocabulary: `teams`, `security`, `pricing`, `launch`.
- Common CTAs: `Start free`, `Book a demo`, `See pricing`.
## Agent Rules
- Start with a concrete user outcome before describing implementation details.
- Prefer short active sentences and visible verbs from the CTA list.
- Do not invent compliance, security, customer, or performance claims.
The real output also includes a small style fingerprint for heading length, paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
What It Does
- Reads a URL or local HTML file.
- Extracts title, meta description, headings, links, buttons, and paragraphs.
- Finds CTA candidates from short action-led links/buttons.
- Measures average sentence length.
- Extracts a compact style fingerprint: heading shape, paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
- Builds a repeated-vocabulary lexicon.
- Writes Markdown or JSON.
- Benchmarks candidate copy against a source voice profile.
- Gates against unsupported claims and copied spans.
- Uses only the Python standard library.
Benchmark
site2voice bench compares candidate copy against measurable source signals:
sentence length, vocabulary overlap, CTA shape, tone labels, heading shape,
claim boundaries, and copy safety.
site2voice bench examples/editorial-home.html \
examples/before-copy.md \
examples/after-copy.md \
--out examples/editorial-benchmark.md
| Candidate | Result | Overall | Lexicon | Copy safety |
|---|---|---|---|---|
after-copy |
PASS | 83.8 | 70.0 | 93.2 |
before-copy |
FAIL | 36.6 | 0.0 | 100.0 |
The benchmark rewards measurable voice alignment without rewarding verbatim copying.
What It Is Not
- Not an official brand guideline.
- Not a DESIGN.md visual-token extractor.
- Not a crawler for private pages or authenticated apps.
- Not an LLM prompt that copies a site's prose.
Develop
python3 -m pip install -e .
make test
make bench
site2voice examples/saas-home.html --out examples/saas-VOICE.md
Links
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file site2voice-0.2.1.tar.gz.
File metadata
- Download URL: site2voice-0.2.1.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37391d98e545e97e74b13aff6fe958f9420a6adda7704cb6b8f8c5f6426f236a
|
|
| MD5 |
afdd5beb658e8219c8e9452e5c7563c6
|
|
| BLAKE2b-256 |
1a6f2da776b5e2646ad37f662083af19330c955b27f7478c47cd33ac9148ed39
|
Provenance
The following attestation bundles were made for site2voice-0.2.1.tar.gz:
Publisher:
publish.yml on SihyeonJeon/site2voice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
site2voice-0.2.1.tar.gz -
Subject digest:
37391d98e545e97e74b13aff6fe958f9420a6adda7704cb6b8f8c5f6426f236a - Sigstore transparency entry: 1579251884
- Sigstore integration time:
-
Permalink:
SihyeonJeon/site2voice@32ddcac39942b27091ccfd86eb663b8dec2b659d -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/SihyeonJeon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@32ddcac39942b27091ccfd86eb663b8dec2b659d -
Trigger Event:
release
-
Statement type:
File details
Details for the file site2voice-0.2.1-py3-none-any.whl.
File metadata
- Download URL: site2voice-0.2.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95fe10a89819e7a879f7fd0354b1b41f8e53afa3eda0ab19ae04fc5d5d060ab2
|
|
| MD5 |
f363c31f340a5e9fa6114059aba830d2
|
|
| BLAKE2b-256 |
aaa6befd40371f3018f8b087a4683960d2dcd336f33ea85ad8f00825f4c083fd
|
Provenance
The following attestation bundles were made for site2voice-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on SihyeonJeon/site2voice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
site2voice-0.2.1-py3-none-any.whl -
Subject digest:
95fe10a89819e7a879f7fd0354b1b41f8e53afa3eda0ab19ae04fc5d5d060ab2 - Sigstore transparency entry: 1579252256
- Sigstore integration time:
-
Permalink:
SihyeonJeon/site2voice@32ddcac39942b27091ccfd86eb663b8dec2b659d -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/SihyeonJeon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@32ddcac39942b27091ccfd86eb663b8dec2b659d -
Trigger Event:
release
-
Statement type: