Skip to main content

Build, test, and publish AI agent skills across every platform.

Project description

ToolMark 🔨

ESLint + Jest + npm publish — for AI Agent Tools.

Build, test, scan, and ship tools across OpenClaw/ClawHub, Claude Code, Cursor, and Windsurf — from a single CLI.

PyPI License: MIT Tests


Why ToolMark?

13,000+ tools are published on ClawHub. 13% contain critical security flaws (Snyk ToxicTools Report, Feb 2026). Tools break silently on platforms other than the one they were tested on. There is no pytest for agent tools — until now.

toolmark init my-tool --template github-api
toolmark test          # LLM-as-judge evaluation
toolmark scan          # prompt injection, dynamic fetch, credential leaks
toolmark compat        # check all 4 platforms at once
toolmark publish       # sign with Ed25519, push to ClawHub + Claude Code

Install

pip install toolmark

Requires Python 3.12+.


Quick Start

# 1. Scaffold
toolmark init my-github-tool --template github-api

# 2. Edit tool.md and tests/
cd my-github-tool

# 3. Test
ANTHROPIC_API_KEY=sk-ant-... toolmark test

# 4. Scan
toolmark scan

# 5. Check platform compatibility
toolmark compat

# 6. Publish
toolmark publish --platforms clawhub,claude-code

Commands

Command What it does
toolmark init Scaffold a new tool from a template
toolmark test LLM-as-judge evaluation against YAML test cases
toolmark scan Security scanner (prompt injection, dynamic fetch, creds)
toolmark compat Cross-platform compatibility check (4 platforms)
toolmark bench Benchmark latency, tokens, compute quality score (0–100)
toolmark publish Sign with Ed25519, publish to configured registries

Templates

toolmark init my-tool --template github-api      # GitHub REST API wrapper
toolmark init my-tool --template file-ops         # Local filesystem tool
toolmark init my-tool --template mcp-integration  # Wraps an MCP server tool
toolmark init my-tool --template web-search       # Search API tool
toolmark init my-tool --template loom-query       # Loom knowledge graph tool
toolmark init my-tool --template blank            # Minimal scaffold

Test Cases (YAML)

# tests/test_search.yaml
- id: search_open_prs
  input: "find my open pull requests"
  expect_invoked: true
  expect_tool: search_pull_requests
  expect_params:
    state: open
    assignee: "@me"
  tolerance: fuzzy     # strict | fuzzy | invoked
  tags: [smoke]

Run: toolmark test --tags smoke


Security

toolmark catches:

  • SF001 — Dynamic fetch (curl | bash, eval(fetch(...)))
  • SF002 — Hardcoded credentials (API keys, passwords)
  • SF003 — Prompt injection phrases in tool descriptions
  • SF004 — Undeclared network endpoints
  • SNYK-* — 138 rules via Snyk agent-scan (if installed)

Provenance Signing

Every published tool is signed with Ed25519:

toolmark keygen              # creates ~/.toolmark/signing.key
toolmark publish --sign      # signs + publishes
toolmark verify my-tool     # verify any published tool

GitHub Actions

Every toolmark init project includes a ready-to-use workflow:

# .github/workflows/toolmark.yml — already in your project
- toolmark compat    # platform check
- toolmark scan      # security gate
- toolmark test      # LLM evaluation (needs ANTHROPIC_API_KEY secret)

Quality Leaderboard

See how your tool ranks: toolmark.dev/leaderboard

Quality Score = test pass rate (50%) + security score (30%) + compat score (20%).


Roadmap

  • init — scaffold with 6 templates
  • test — LLM-as-judge evaluation
  • scan — built-in security rules + Snyk integration
  • compat — 4-platform compatibility matrix
  • bench — composite quality score
  • publish — Ed25519 signing + ClawHub
  • watch — re-run tests on save
  • VS Code extension
  • Rust benchmark runner
  • Claude Code + Cursor + Windsurf publish

Contributing

See CONTRIBUTING.md. We always have good first issues.

License

MIT — see LICENSE.


Built by @ddevilz as part of the Loom AI tooling ecosystem.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolmark-0.1.0.tar.gz (182.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolmark-0.1.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file toolmark-0.1.0.tar.gz.

File metadata

  • Download URL: toolmark-0.1.0.tar.gz
  • Upload date:
  • Size: 182.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for toolmark-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a03a0666ef553bd6fee6c8c3ead43d076d7b9a0088264800f4dcdf951d035547
MD5 4dab3ca6a800e7533d2fa24d9a0c33f4
BLAKE2b-256 a836ec1ac21bba9e09dd7e9b7d75d5c4c6ea1a598e4d12152755b9ca0b3d5aea

See more details on using hashes here.

File details

Details for the file toolmark-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: toolmark-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for toolmark-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f97a72c68cbebf2287cc0193960d3f8c62df5e842129e02fadbfc91f4ae92119
MD5 2843f4fbf5b56b158b37611e92a5e82a
BLAKE2b-256 76c0e1b0c874b5c19f214fef972da9330f0a93a7715d92561abdfd8ef8989429

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page