Skip to main content

Generate AI-agent VOICE.md files from website copy and CTAs.

Project description

site2voice

Generate VOICE.md from any website.

site2voice reads website copy and writes a small Markdown brief that tells an AI coding agent how the site sounds: headings, CTAs, navigation labels, sentence shape, repeated vocabulary, and claim boundaries.

pipx install site2voice

site2voice https://example.com --out VOICE.md

From a repo clone, run the included benchmark fixture:

site2voice examples/saas-home.html --format json
site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md

Why

DESIGN.md helps agents stop guessing visual style. VOICE.md helps them stop guessing copy style.

Drop the generated file into a project and tell the agent:

Use @VOICE.md for landing-page copy, headings, CTAs, and UI microcopy.

Output

# VOICE.md

## Voice Summary

- Overall tone: explanatory, action-oriented, trust-forward.
- Sentence shape: about 20.4 words per sentence.
- Main vocabulary: `teams`, `security`, `pricing`, `launch`.
- Common CTAs: `Start free`, `Book a demo`, `See pricing`.

## Agent Rules

- Start with a concrete user outcome before describing implementation details.
- Prefer short active sentences and visible verbs from the CTA list.
- Do not invent compliance, security, customer, or performance claims.

The real output also includes a small style fingerprint for heading length, paragraph rhythm, CTA shape, CTA verbs, and lexical variety.

What It Does

  • Reads a URL or local HTML file.
  • Extracts title, meta description, headings, links, buttons, and paragraphs.
  • Finds CTA candidates from short action-led links/buttons.
  • Measures average sentence length.
  • Extracts a compact style fingerprint: heading shape, paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
  • Builds a repeated-vocabulary lexicon.
  • Writes Markdown or JSON.
  • Benchmarks candidate copy against a source voice profile.
  • Gates against unsupported claims and copied spans.
  • Uses only the Python standard library.

Benchmark

site2voice bench compares candidate copy against measurable source signals: sentence length, vocabulary overlap, CTA shape, tone labels, heading shape, claim boundaries, and copy safety.

site2voice bench examples/editorial-home.html \
  examples/before-copy.md \
  examples/after-copy.md \
  --out examples/editorial-benchmark.md
Candidate Result Overall Lexicon Copy safety
after-copy PASS 83.8 70.0 93.2
before-copy FAIL 36.6 0.0 100.0

The benchmark rewards measurable voice alignment without rewarding verbatim copying.

What It Is Not

  • Not an official brand guideline.
  • Not a DESIGN.md visual-token extractor.
  • Not a crawler for private pages or authenticated apps.
  • Not an LLM prompt that copies a site's prose.

Develop

python3 -m pip install -e .
make test
make bench
site2voice examples/saas-home.html --out examples/saas-VOICE.md

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

site2voice-0.2.1.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

site2voice-0.2.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file site2voice-0.2.1.tar.gz.

File metadata

  • Download URL: site2voice-0.2.1.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for site2voice-0.2.1.tar.gz
Algorithm Hash digest
SHA256 37391d98e545e97e74b13aff6fe958f9420a6adda7704cb6b8f8c5f6426f236a
MD5 afdd5beb658e8219c8e9452e5c7563c6
BLAKE2b-256 1a6f2da776b5e2646ad37f662083af19330c955b27f7478c47cd33ac9148ed39

See more details on using hashes here.

Provenance

The following attestation bundles were made for site2voice-0.2.1.tar.gz:

Publisher: publish.yml on SihyeonJeon/site2voice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file site2voice-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: site2voice-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for site2voice-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 95fe10a89819e7a879f7fd0354b1b41f8e53afa3eda0ab19ae04fc5d5d060ab2
MD5 f363c31f340a5e9fa6114059aba830d2
BLAKE2b-256 aaa6befd40371f3018f8b087a4683960d2dcd336f33ea85ad8f00825f4c083fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for site2voice-0.2.1-py3-none-any.whl:

Publisher: publish.yml on SihyeonJeon/site2voice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page