Abstract skill interfaces for AI — delivery, testing, architecture, security, and more. Skills define WHAT; AI decides HOW.

These details have not been verified by PyPI

Project description

ai-skill-interface

Directive

Skills are interfaces: each skill defines WHAT must be achieved — never HOW. You are the implementation: read the skill → inspect this project → fulfill the contract using this project's own language, tools, and conventions. One skill works for any project, any language, any AI — because you are the adapter.

Positioning

ai-skill-interface is not a skill marketplace or registry. It is a portable operating contract for AI coding agents: delivery, testing, coverage, docs, security, recovery, RAG diagnosis, evaluation, and agent runtime discipline.

Package managers distribute skills. This project defines the reusable development behaviors an AI agent should satisfy across projects.

Installation

# npm — installs skills to ~/.claude/skills/
npx ai-skill-interface

# pip — installs skills to ~/.claude/skills/
pip install ai-skill-interface && ai-skill-interface

# git submodule — wires skills into your project's CLAUDE.md
curl -sL https://raw.githubusercontent.com/jaemyeong-hwnag/common-ai-skill/main/scripts/submodule-install.sh | sh

How Skills Are Loaded

Each skill lives in skills/<name>/SKILL.md. When an AI reads this repository, it loads the relevant skill files directly. The README serves as the index and auto-selection guide; each skill's full definition is below and in its own file.

Context Signals

Detect these signals before selecting skills:

change.type    : code | ai-feature | explicit
change.scope   : core-logic | interface | infra | test | docs
arch.pattern   : hexagonal | layered | none
quality.status : no-tests | low-coverage | high-coverage
ai.complexity  : single-call | pipeline | stateful | multi-agent
action.risk    : reversible | irreversible

Auto-Selection

change.type=code
  → delivery-workflow
  + hexagonal-development   (arch.pattern=hexagonal or layered)
  + interface-first-development (change.scope=interface)
  + code-documentation      (change.scope=interface or public surface changed)
  → finalize                (after completion)

change.type=ai-feature
  → framework-selection     (always first)
  + rag-development         (retrieval pipeline present)
  + rag-failure-diagnosis   (retrieval-backed answer quality failure reported)
  + rag-regression-testing  (retrieval-backed behavior changed after failure)
  + observability           (ai.complexity≥pipeline)
  + evaluation              (quality measurement needed)
  + human-in-the-loop       (action.risk=irreversible)
  + agent-orchestration     (ai.complexity=multi-agent)
  + harness-engineering     (agent runtime design, guardrails, or sandboxing involved)

change.type=explicit
  → /version /security-audit /principle-audit /ai-token-optimize
    /coverage /test-runner /finalize /self-recovery

Composition

finalize = test-runner + coverage + docs-sync + delivery-workflow(commit)

Dependency Order

Apply required skills before dependent skills. Detect requires: in each SKILL.md frontmatter and resolve in dependency order before execution.

Skills

delivery-workflow

Enforce the full delivery cycle for every implementation task. Covers coding, testing, coverage, commit rules, and auto-commit on completion. This skill MUST be followed for all code changes.

Sequence

implement → run tests → fix failures → repeat until pass → add/update tests for changed behavior → rerun → commit by purpose

Rules

every code change requires test run
new behavior requires new tests
changed behavior requires updated tests
commit only when all tests pass
never mix purposes in one commit
auto-commit when work is complete

Commit Format

format: <type>: <imperative summary, ≤72 chars>
types: feat=new feature | fix=bug fix | refactor=no behavior change | test=test change | docs=doc change | chore=build/config/deps
forbidden: vague summaries (update, fix issues, changes)

Done all tests pass + tests exist for changed behavior + commits separated by purpose

hexagonal-development

Detect and enforce layered architecture boundaries. Use when the project separates concerns into layers and changes touch layer boundaries.

Sequence

detect project layer structure → apply change flow inside-out

Change Flow

domain/business rule → inbound contract → outbound contract → infrastructure adapter → presentation mapping

Rules

detect existing layer naming, do not rename/restructure unless requested
domain layer: zero deps on infrastructure or presentation
infrastructure types must not leak into domain or contract layers
explicit mapping between infrastructure models and domain models
presentation layer: thin, delegates to use-cases
deps always point inward: outer→inner, never reverse

interface-first-development

Define contracts before implementations. Use when adding or changing abstractions, service boundaries, or cross-layer dependencies.

Sequence

detect project's abstraction mechanism → define/update contract first → implement after contract stable → wire via DI/config → update external boundaries only for surface changes → test contract behavior

Rules

infrastructure types must not appear in contract signatures
function signatures: minimal, intention-revealing
prefer extending existing contracts over ad-hoc cross-layer deps
one contract = one responsibility

test-runner

Detect the project's test runner and execute the full test suite with coverage reporting. Fail fast and fix on failure.

Sequence

detect test framework from project → run full suite → on failure: classify(code bug|test bug) → fix → rerun → repeat until pass → summarize

Output Format

Tests:total=N passed=N failed=N skipped=N Coverage:N% Duration:Xs

Rules

tests must be independently runnable, no order dependency
mock only external I/O boundaries

coverage

Analyze test coverage and drive it to 80%+ by writing missing tests. Detects the project's test runner automatically.

Sequence

detect coverage tool → measure → identify uncovered code → write tests → remeasure → repeat until ≥80%

Priority

core business logic → service/use-case → infrastructure/integration → entry points

Test Strategy

happy path + boundary values + error paths + all conditional branches

Exclusions

entry-point bootstrap | auto-generated files | config/env files | type-definition-only files

Done coverage ≥80% + all tests pass

finalize

Post-implementation pipeline — run tests, enforce 80% coverage, sync docs, then commit atomically. Run after every feature or fix.

Sequence

test(detect→run→fix→pass) → coverage(measure→write tests→≥80%) → docs-sync(detect drift→update) → commit(by purpose, type: summary)

Output Format

Tests:total=N passed=N failed=N Coverage:N% Docs:[files] Commits:[hash:msg]

Done all tests pass + coverage ≥80% + docs synced + commits created

self-recovery

Recover failed, stalled, looping, or unsafe AI-run work by monitoring execution, stopping stale attempts, diagnosing root cause, applying the smallest safe fix, verifying, and reporting evidence.

Sequence

define success/failure/rollback → run with a timer → monitor evidence → stop stale or unsafe work → classify cause → fix smallest local cause → verify → restart only when safe → report evidence

Rules

define success and failure before starting work
use adaptive check intervals based on the task's expected progress
stop stale, duplicate, crashed, or unsafe execution before changing state
never repeat the same failed attempt without a new cause, fix, or diagnostic
classify the root cause before applying a fix
prefer the smallest reversible local fix
restart only after verification or after choosing a safe degraded path

Done failed or stale work stopped + cause classified + smallest safe fix applied or blocker reported + verification evidence collected + final state reported

docs-sync

Detect documentation drift from recent code changes and synchronize docs to match the current codebase.

Sequence

scan project docs → identify changed code → map changes to affected docs → update docs → validate structure

Change-to-Doc Mapping

new public API/module → API docs, architecture overview
public interface change → API docs, usage examples
config change → config guide, env examples
schema change → data model docs
dependency change → install guide
architecture change → architecture docs

Rules

code is source of truth: docs follow code
add sections for new topics
remove docs for deleted features
fix broken internal links
sync TOC with actual headings
remove duplicate content

code-documentation

Document public code surfaces and API behavior using the project's native documentation conventions while avoiding redundant comments and private implementation noise.

Sequence

detect documentation conventions → identify public surfaces → document intent, inputs, outputs, errors, and side effects → document externally visible contracts → remove or avoid redundant comments → verify style consistency

Rules

document public surfaces, caller expectations, errors, and side effects
do not restate obvious code literally
do not expose private implementation details unless they explain non-obvious behavior
do not replace existing correct comments
preserve the surrounding project's documentation style

Done public surfaces documented + external contracts clear + stale comments updated + redundant comments avoided + project documentation style preserved

security-audit

Comprehensive security review — secrets exposure, dependency vulnerabilities, code injection risks, and infrastructure config.

Scan Targets

secrets: VCS history + source files for hardcoded credentials + verify secret/key/log files excluded from VCS
deps: run project's vulnerability scanner
code: injection(query,command,template) + path traversal + missing input validation at boundaries + error exposure(stack traces,internal paths) + credential logging
infra: unnecessary port exposure + default/weak credentials + secrets not via env vars

Severity HIGH(secret exposure, injection) → fix immediately MEDIUM(vulnerable dep, error exposure) → fix this cycle LOW(potential risk, best-practice violation) → next cycle

Action

fix HIGH issues directly when possible

version

Manage semantic versioning — auto-detect version bump type from recent commits, update version files, write CHANGELOG, and create a git tag.

Bump Rules

BREAKING CHANGE→MAJOR(X+1.0.0) | feat→MINOR(X.Y+1.0) | fix,refactor,perf→PATCH(X.Y.Z+1) | test,docs,chore→no change

Sequence

detect current version(tags→version files→CHANGELOG) → determine bump(arg or auto from commits) → update all version files → write CHANGELOG(Added/Changed/Fixed/Removed) → commit + tag

Usage

/version (auto) | /version patch | /version minor | /version major

ai-token-optimize

Optimize AI-consumed code and data for token efficiency — reduce token usage while preserving semantic fidelity. Applies to prompts, tool outputs, and inter-agent messages.

Scope

include: AI-consumed data (prompts, tool returns, agent messages, AI-parsed structures, LLM context)
exclude: human-facing data (docs, tests, config, schema)

Techniques

legend: define all abbreviations once at prompt start; reuse without re-definition — prefer abbreviations already present in model training data (domain-standard codes require no definition)
compact k:v: verbose structured labels → short key:value pairs, pipe or comma-separated
tabular: repeated-structure collections → typed header with column list once, then value rows only
numeric notation: express numbers as {digit_count:value} to prevent tokenizer fragmentation on large or precise values
structural tags: lightweight semantic markup for boundaries instead of deep nesting or verbose prose
placement: identity and critical directives at top, supporting data in middle, output schema at bottom — mitigates attention degradation in long contexts

Sequence

scan AI-consumed targets → identify repetition and verbosity → apply techniques → verify AI can parse output → run tests → commit per module

Rules

apply only to AI-consumed data — never to human-facing content
verify after each technique: AI must parse output correctly
legend abbreviations must be unambiguous within their scope
tabular format requires consistent column order across all rows
placement order: legend → identity/directives → data → output schema

Done labels compressed + collections tabular + numerics notation-formatted + critical info at top + output schema at bottom + tests pass

ai-token-efficiency-research

Use research evidence to choose token-efficiency and cost-reduction options for AI systems.

Scope

reducing unnecessary input material → reducing retained runtime state → reusing stable context → retaining and retrieving durable context → resource, quality, stability, and safety trade-offs → process-level value analysis

Activation

Trigger this skill when the user asks how to reduce token cost while maintaining quality, which efficiency strategy fits a workload, for source-backed evidence in token efficiency, for a research-backed skill or reference document, or why a token-saving option changes quality, time, or operating cost.

Skip when the request has no research depth, official pricing or documentation is the primary source of truth, the user asks for a quick non-research answer, or the task has no token-efficiency question.

Research Map

identify avoidable material → reduce retained runtime state → reuse stable context across repeated work → retain and retrieve durable context → locate process steps with poor value for their cost

Workflow

identify workload shape → map to research area → prefer primary sources, evaluation reports, and maintained evidence → verify all quantitative claims before reporting them → compare applicability, expected value, quality risk, operational complexity, and failure modes → recommend a path and measurement plan

Evidence Requirements

numeric claims must be sourced, current, and tied to the reported conditions
preprint, peer-reviewed, project-site, evaluation, and secondary-summary evidence must be clearly distinguishable
do not generalize one result to all settings
savings claims must account for all relevant usage
efficiency methods must include quality and operational caveats
stale or unverified research claims must be labeled as unverified or omitted

Output

Recommendation: <selected option>
Fit: <why it applies>
Evidence: <sourced support or measure-first note>
Limit: <main caveat>
Validation: <first check>
Sources: <source identifiers>

Done the answer maps the workload to the right option + research claims are sourced and scoped + trade-offs are explicit + recommendation includes a measurement path + unsupported numbers are not reported

principle-audit

Audit the codebase for violations of the project's core principles — detect unintended system-imposed constraints that contradict the project's stated goals.

Sequence

find project principles(docs, config) → scan for violations → classify(allowed vs violation) → report + fix

Violation Types

unintended constraints: system blocks behavior without business rule
layer boundary: deps in forbidden direction, logic in wrong layer
consistency: same-purpose logic in conflicting ways, naming/error-handling mismatch
hardcoded assumptions: values that should be configurable, env-specific assumptions

Distinguish

allowed: status as info | config-based threshold | domain rule rejection
violation: status blocks execution | hardcoded threshold | infra-layer business decision

Severity CRITICAL(direct violation, immediate impact) → fix now WARNING(violation, currently inactive) → fix this cycle INFO(potential risk) → comment/TODO

Report

per violation: file:line + description + related principle + fix recommendation fix CRITICAL immediately → run tests

paper-research

Proactively use academic literature when it can improve technical answers, even when the user does not explicitly ask for papers or research.

Use academic evidence when it clearly improves a technical answer, even if the user did not ask for papers. Keep automatically triggered citations lightweight: the main answer comes first, and research support follows only as much as it helps the user decide, verify, or go deeper.

Activation

Trigger this skill for:

algorithm or data-structure internals where original or survey literature exists
system design trade-offs with empirical or comparative research
performance analysis, benchmark interpretation, or claims about speed, accuracy, scale, or quality
named techniques, models, protocols, storage designs, indexing methods, compression methods, ranking methods, or distributed algorithms
comparisons such as "A vs B" when prior work can clarify differences
requests for trend, state of the art, recent work, prior work, references, evidence, or source attribution
academic identifiers, conference names, paper titles, author names, or uploaded academic documents

Skip this skill for casual conversation, product documentation, trivial syntax, setup questions, ordinary code work without methodological depth, or explicit requests for a short answer without sources.

Modes

Mode	Signal	Outcome
Automatic support	technical question where literature would help	answer normally, then add a short research note
Search	broad topic, trend, or evidence request	find a small set of relevant papers and synthesize
Single-paper analysis	paper identifier, title, URL, or academic document	summarize metadata, method, results, limits, and source
Survey or comparison	multiple methods, "A vs B", state of the art	compare representative papers and explain trade-offs

Workflow

classify need → search or inspect sources → prefer primary evidence → distinguish peer-reviewed work from preprints → extract problem, contribution, method, evaluation, results, limits, and implications → synthesize what the evidence changes → cite stable identifiers or venue/year metadata

Source Priority

peer-reviewed conference, journal, or workshop papers → well-cited preprints with clear methodology → official proceedings, author pages, and reproducibility artifacts → benchmark leaderboards or implementation indexes as supporting evidence only → secondary summaries only when primary sources are unavailable or used for navigation

Output

For automatic support, keep the research note short:

참고: <paper or venue/year> reports a related finding: <one practical implication>.

For a single paper:

# <title>
Authors / venue or status / year

## One-line Summary
## Core Contributions
## Method
## Results
## Limitations
## Practical Takeaway
## Sources

For a survey or comparison:

## Related Work: <topic>

### 1. <paper>
- Core idea:
- Evidence:
- Limit:
- Source:

Rules

do not make remembered numeric claims without source verification
do not over-weight a preprint when peer-reviewed evidence contradicts it
do not let citations overwhelm the answer when research was not explicitly requested
keep summaries copyright-safe and concise
measured results, author claims, and practical interpretation remain distinguishable
report when no strong academic evidence was found

Done the answer uses the right level of academic evidence + source status is clear + claims are attributed + limitations are stated + automatic citations remain lightweight

framework-selection

Choose the right tool, library, or architecture for the task — minimal complexity for the requirement.

Sequence

classify problem complexity → detect what the project already uses → match to the lowest sufficient tier → record rationale

Complexity Tiers

single call       → no framework
pipeline          → composable steps, minimal orchestration
stateful workflow → explicit state, resumability
multi-agent       → routing, delegation, parallelism

Rules

detect existing project tools before proposing new ones
start at the lowest tier that satisfies the stated requirement
escalate only when the lower tier cannot meet a requirement
switching tiers mid-project requires explicit justification

Done chosen approach matches complexity tier + existing conventions respected + rationale recorded

rag-development

Implement Retrieval-Augmented Generation pipelines — ingestion, chunking, embedding, retrieval, and generation.

Sequence

ingest → chunk → embed → store → retrieve → rerank → generate

Rules

preserve source metadata (origin, section, date) through all stages
chunk boundaries must respect semantic units
embedding model must be identical at index time and query time
retrieval store must match the data scale
retrieve candidates before filtering by relevance — never pass unfiltered results to generation
generation output must attribute retrieved sources

Done all stages present + metadata preserved + embedding model consistent + sources attributed in output

rag-failure-diagnosis

Diagnose failures in retrieval-augmented answers by assigning the first failing subsystem: retrieval, generation, server behavior, guardrails, or a combination.

Sequence

capture question and answer → inspect retrieved evidence → compare evidence to question → compare answer to evidence → inspect fallback or guardrail behavior → assign first failing subsystem → recommend smallest next fix

Verdicts

retrieval failure | generation failure | server or guardrail failure | combined failure | insufficient evidence to decide

Rules

judge retrieval before judging generation
if evidence is missing, wrong, or indirect, classify retrieval first
if evidence is correct but the answer ignores or changes it, classify generation
if fallback or guardrail behavior overrides useful evidence, classify server or guardrail behavior
name the failing subsystem instead of using vague quality labels

Done clear subsystem verdict + evidence-backed cause + next fix scoped to retrieval, generation, server behavior, guardrails, or data

rag-regression-testing

Verify retrieval-augmented generation fixes with same-case, nearby-case, and unrelated-case tests after changing data, retrieval, ranking, prompting, fallback, or guardrail behavior.

Sequence

apply targeted fix → refresh affected retrieval state if needed → run same-case test → run nearby-case test → run unrelated-case test → inspect retrieval and answer for each → report pass or fail

Required Cases

same case: exact failed scenario | nearby case: close variant | unrelated case: distinct question or domain

Rules

inspect retrieved evidence and final answer for every test
same-case retrieval must include directly relevant evidence
nearby-case retrieval must prove the fix generalizes beyond one wording
unrelated-case retrieval must avoid the previous false-positive pattern
final answers must stay within retrieved evidence
do not call the fix complete when only the exact failed case passes

Done same-case, nearby-case, and unrelated-case tests run + retrieval inspected + final answers inspected + remaining risks reported

observability

Instrument AI workflows with tracing, logging, and monitoring to enable debugging, auditing, and performance analysis.

Sequence

identify instrumentation boundaries → add spans → attach metadata → verify no gaps

Boundaries

entry points      → user input, external triggers
model calls       → inputs, outputs, latency, token counts
tool calls        → name, inputs, outputs, errors
inter-agent msgs  → sender, receiver, content, timing
state transitions → before/after snapshot

Rules

every AI model call must produce a trace entry
token counts must be extracted from the model API response metadata, not estimated or hardcoded — never use response length or character count as a token proxy
propagate trace context across service boundaries
attach a correlation ID to every trace
structured output only — no free-form strings
no secrets, PII, or credentials in traces

Done all model calls traced + tool calls recorded + correlation IDs present + no secrets in traces

evaluation

Build evaluation pipelines for AI outputs — create datasets, write evaluators, and measure quality systematically.

Sequence

collect dataset → write evaluators → run baseline → change system → rerun → compare delta → iterate

Dataset Types

final response  → input → expected output
step-level      → input → expected intermediate step
trajectory      → input → expected sequence of steps

Evaluator Types

code evaluator  → deterministic check (exact match, schema, format)
LLM-as-judge    → semantic check (correctness, tone, safety)
human evaluator → gold standard for ambiguous criteria

Rules

prefer code evaluators for measurable criteria
every evaluator must return a score and a reason
judge model must differ from the model being evaluated
evaluators must produce consistent output for the same input
record a baseline before any system change

Done dataset covers representative inputs + each evaluator returns score+reason + baseline recorded + regression detectable

human-in-the-loop

Insert human approval, review, or correction checkpoints into AI workflows — interrupt, wait, resume safely.

Sequence

reach checkpoint → serialize state → emit review request → halt → human acts → restore state → resume

Interrupt Triggers

irreversible actions    → destructive, external, financial
low confidence          → below an explicit threshold
high-cost ambiguity     → wrong assumption cannot be easily undone
compliance gate         → policy or sign-off required
limit approaching       → resource or cost ceiling about to be exceeded

Escalation Tiers

soft  → surface suggestion, continue unless rejected
hard  → halt until explicit approval
stop  → halt and require human restart

Rules

state at interrupt must be fully serializable
resume path must be idempotent — no duplicated side effects
review payload must include: pending action, context, options
rejected actions must be recorded with reason
never auto-resume on timeout without an explicit policy

Done all irreversible actions have an interrupt point + state serializable + resume idempotent + rejections recorded

agent-orchestration

Coordinate multiple agents — routing, delegation, parallelism, state sharing, and result aggregation.

Sequence

decompose task → route or delegate → execute → aggregate results → handle failures

Patterns

sequential   → chain agent outputs as inputs
parallel     → dispatch independent tasks, aggregate results
routing      → classify input, dispatch to specialist
delegation   → orchestrator plans, sub-agents execute
map-reduce   → split input, process independently, reduce

Rules

orchestrator owns the plan; sub-agents own execution
sub-agents must not communicate directly — only via orchestrator
every inter-agent message must be serializable
sub-agent failures must be handled explicitly — no silent drops
delegation chain depth must be bounded
shared state must flow through a single authoritative store
every message must carry a correlation ID traceable to the root request

Done agents communicate only via orchestrator + messages serializable + failures handled + depth bounded + correlation IDs present

harness-engineering

Design the runtime environment around AI agents — constraints, guardrails, verification, sandboxing, feedback loops, and recovery mechanisms that make agents reliable at scale.

Sequence

define agent boundaries → design guardrails → configure sandboxing → build verification pipeline → implement feedback loops → add recovery mechanisms → set up session continuity → validate end-to-end

Layers

context management     → what the agent sees (repo-local, versioned artifacts only)
permission boundary    → what the agent can access (files, tools, commands)
execution sandbox      → where the agent runs (isolated, rollback-capable)
validation pipeline    → how outputs are verified (deterministic + semantic)
architectural guard    → structural rules the agent must not violate
feedback loop          → how failures inform next attempts
recovery mechanism     → how the system handles and corrects errors
session continuity     → how state bridges across context resets

Architectural Constraints

deps flow: types → config → repository → service → runtime → presentation
structural tests validate layer compliance automatically
agent cannot bypass constraints — enforcement is mechanical, not instructional

Context Strategy

repository-centered  → all agent-accessible knowledge lives in versioned artifacts
map over manual      → concise pointers, not exhaustive instructions
scoped context       → load only what the current task requires
knowledge promotion  → external knowledge → repo artifact before agent use
adaptive compaction  → progressively reduce older observations to preserve recent context

Verification Tiers

deterministic   → static analysis, structural tests, schema validation
semantic        → model-based correctness, intent alignment, output quality
composite       → deterministic first (fast, cheap), semantic second (nuanced)
generator-evaluator → separate generation from evaluation in adversarial loop

Risk-Based Routing

low risk    → automated validation only, no human gate
medium risk → automated validation + async human review
high risk   → halt until explicit human approval before execution

Session Continuity

progress artifact    → structured file tracking completed work and next steps
initializer session  → first run sets up environment and initial state
incremental session  → subsequent runs read progress, advance, update artifact
context reset        → clear context between sessions to prevent drift
automatic compaction → compress history when context limit approaches

Agent Topology

single agent         → one agent handles full task (prefer when sufficient)
plan-execute split   → one agent plans, another executes (reduce reasoning load)
generator-evaluator  → one agent produces, another critiques in adversarial loop
specialist routing   → classify input, dispatch to domain-specific agent

Trace-Driven Improvement

capture  → log every tool call, decision, and intermediate output
analyze  → cluster failure patterns across runs
detect   → identify doom loops (repeated edits to same target without progress)
convert  → turn recurring failure patterns into new constraints or middleware

Harness Evolution

each harness component encodes an assumption about model limitations
when models improve → re-test assumptions → strip what is no longer needed
prefer removing complexity over adding it — simplification beats sophistication
harness must remain model-agnostic — swappable without structural changes

Rules

enforce constraints mechanically — never rely solely on prompt instructions for safety
every agent action must be auditable — log inputs, outputs, tool calls, and decisions
sandbox must support rollback — no irreversible side effects without explicit approval
permission boundaries must follow least-privilege — grant minimum access required per task
context is a scarce resource — provide maps, not manuals; pointers, not payloads
architectural boundaries must be enforced by structural tests, not by convention
verification must combine deterministic and semantic checks — neither alone is sufficient
feedback from failures must be converted into reusable constraints — not one-off fixes
recovery must be automatic for known failure classes, escalated for unknown ones
agent-generated artifacts must pass the same quality gates as human-generated ones
periodic maintenance agents must detect drift in docs, architecture, and constraints
long-running tasks must persist progress in structured artifacts across sessions
detect doom loops via trace analysis — intervene when repeated attempts show no progress
route actions through risk tiers — high-risk actions require human approval
prefer single-agent topology — escalate to multi-agent only when single agent is insufficient
harness must survive model changes — no coupling to specific model capabilities
evaluator must be adversarial — optimistic self-review is insufficient for quality

Anti-Patterns

prompt-only safety       → relying on instructions instead of mechanical enforcement
unbounded context        → dumping all docs into context instead of targeted selection
trust-by-default         → granting full access instead of least-privilege boundaries
manual-only review       → human review as sole quality gate instead of automated verification
fix-and-forget           → patching individual failures instead of creating reusable constraints
optimistic self-eval     → agent evaluating own work without adversarial challenge
model-coupled harness    → harness assumptions hardcoded to specific model behavior
session amnesia          → no structured handoff between context windows
complexity accumulation  → adding harness components without re-testing necessity

Done permission boundaries enforced + execution sandboxed with rollback + verification pipeline active (deterministic + semantic + generator-evaluator) + architectural constraints mechanically enforced + feedback loops convert failures to constraints + all agent outputs pass same quality gates as human outputs + audit trail present for all agent actions + session continuity via progress artifacts + doom loop detection active + risk-based routing configured + harness components documented

Language

Default: English. If research or evidence shows another format — structured notation, symbolic representation, or any non-natural-language encoding — produces better AI comprehension, instruction-following, or token efficiency, use that instead. Language choice is subordinate to AI optimality.

Format

Markdown headings for structure + XML tags (<constraints>, <criteria>, <instruction>) for semantic boundaries. This hybrid format is validated for cross-model compatibility. Format is not fixed — replace it if a demonstrably better AI-optimized format exists.

Rules

- detect project conventions before acting - test before commit - code is source of truth for docs - one commit = one purpose, format: type: summary - deps point inward only - coverage target ≥80%, priority: core→service→integration→entry

Contributing

If you are an AI encountering this repository for the first time, read CONTRIBUTING.md. It contains an audit procedure: validate all skills against current research, check cross-model compatibility, and submit a PR for any improvements found.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.8.0

May 21, 2026

1.4.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_skill_interface-1.8.0.tar.gz (59.2 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_skill_interface-1.8.0-py3-none-any.whl (43.2 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file ai_skill_interface-1.8.0.tar.gz.

File metadata

Download URL: ai_skill_interface-1.8.0.tar.gz
Upload date: May 21, 2026
Size: 59.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_skill_interface-1.8.0.tar.gz
Algorithm	Hash digest
SHA256	`9e8cc1370f8e787c2412c092fa8062ed9fa385f3dca6d986c94ec26552174636`
MD5	`f7bcea23ac1983e2e0a1c547d92af768`
BLAKE2b-256	`71ab4d602513f706a93a8e26a2e83c2871b00bc199203dcd941b22b7bb34e237`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_skill_interface-1.8.0.tar.gz:

Publisher: publish.yml on jaemyeong-hwnag/common-ai-skill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_skill_interface-1.8.0.tar.gz
- Subject digest: 9e8cc1370f8e787c2412c092fa8062ed9fa385f3dca6d986c94ec26552174636
- Sigstore transparency entry: 1590691121
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: jaemyeong-hwnag/common-ai-skill@7326e2ae399dd94f28d14916bb6e9ae0b58aed72
- Branch / Tag: refs/tags/v1.8.0
- Owner: https://github.com/jaemyeong-hwnag
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7326e2ae399dd94f28d14916bb6e9ae0b58aed72
- Trigger Event: release

File details

Details for the file ai_skill_interface-1.8.0-py3-none-any.whl.

File metadata

Download URL: ai_skill_interface-1.8.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 43.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_skill_interface-1.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7fc642c71c69356a4661911a805c0d27c6944fd05955744e5b567eb7a87edfcb`
MD5	`d5bd221a55af65d3a3f6e40e76eca025`
BLAKE2b-256	`aff0818fe82713f84045367a873233133fcfe809540c9d4d1de5e7b0520b9693`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_skill_interface-1.8.0-py3-none-any.whl:

Publisher: publish.yml on jaemyeong-hwnag/common-ai-skill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_skill_interface-1.8.0-py3-none-any.whl
- Subject digest: 7fc642c71c69356a4661911a805c0d27c6944fd05955744e5b567eb7a87edfcb
- Sigstore transparency entry: 1590691133
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: jaemyeong-hwnag/common-ai-skill@7326e2ae399dd94f28d14916bb6e9ae0b58aed72
- Branch / Tag: refs/tags/v1.8.0
- Owner: https://github.com/jaemyeong-hwnag
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7326e2ae399dd94f28d14916bb6e9ae0b58aed72
- Trigger Event: release

ai-skill-interface 1.8.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ai-skill-interface

Directive

Positioning

Installation

How Skills Are Loaded

Context Signals

Auto-Selection

Composition

Dependency Order

Skills

delivery-workflow

hexagonal-development

interface-first-development

test-runner

coverage

finalize

self-recovery

docs-sync

code-documentation

security-audit

version

ai-token-optimize

ai-token-efficiency-research

principle-audit

paper-research

framework-selection

rag-development

rag-failure-diagnosis

rag-regression-testing

observability

evaluation

human-in-the-loop

agent-orchestration

harness-engineering

Language

Format

Rules

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance