Skip to main content

naoh / book-to-skill: turn a PDF/MD/DOCX book into a Claude Code skill. Skill mode (no API) + provider-agnostic API mode.

Project description

naoh — book-to-skill

CI License: MIT Python 3.11+

Turn a technical book or document (.pdf, .md/.markdown, .docx) into a Claude Code skill: a front-loaded SKILL.md plus on-demand chapter summaries, a glossary, a patterns file, and a cheatsheet.

Two modes, one core

Extraction + structure detection are shared. What differs is who writes the summaries:

Mode API key? Who synthesizes How to trigger
Skill mode (default) No The Claude Code agent, by following SKILL.md install the skill, then /book-to-skill <file> [slug]
API mode Yes A configurable LLM provider, in our Python code book-to-skill build <file> [slug] --provider <name>

Output (same in both modes)

~/.claude/skills/<slug>/
├── SKILL.md            # core mental models + chapter index + topic index (front-loaded)
├── chapters/chNN-*.md  # one dense summary per chapter (loaded on demand)
├── glossary.md         # key terms, alphabetical, with chapter refs
├── patterns.md         # techniques / algorithms / design patterns
└── cheatsheet.md       # decision tables + quick rules

Supported input formats

Only three, by design — anything else is rejected with a clear message:

  • PDF (.pdf) — pdftotext -layout (poppler) → pypdfpdfminer.six (first non-empty wins).
  • Markdown (.md, .markdown) — read as UTF-8 (BOM-safe); headings drive structure detection.
  • DOCX (.docx) — python-docx, with a zero-dependency stdlib zipfile XML fallback.

Routing is by file suffix, with a magic-byte sniff fallback (%PDF, ZIP→docx) when the suffix is missing or wrong. If a backend is missing, the tool prints the exact install commands and exits non-zero (it never fails silently).

Install

pip install naoh            # from PyPI

The PyPI distribution is naoh; it installs the book_to_skill import package and the book-to-skill CLI command.

From a clone (for development):

pip install -e ".[dev]"     # with dev/test extras
# or, for runtime only:
pip install -e .

Python 3.11+ required. PDF live extraction works best with poppler (pdftotext); without it the tool falls back to pypdf / pdfminer.six.

Usage

# 1) Extract only — writes full_text.txt + metadata.json, prints their paths (used by skill mode)
book-to-skill extract path/to/book.pdf

# 2) Analyze — extraction + structure report, no files written
book-to-skill analyze path/to/book.pdf

# 3) Build (API mode) — generate the whole skill with a provider
book-to-skill build path/to/book.pdf my-slug --provider groq --model openai/gpt-oss-20b

Useful flags for build: --style {technical,prose,auto}, --base-url (for local/custom), --analyze-only, --skills-root <dir>, --yes (skip the confirmation prompt).

Skill mode (no API key)

Install the skill, then invoke it from Claude Code:

/book-to-skill ~/Documents/some-book.pdf clean-architecture

The agent runs scripts/extract.py, reads the extracted text, and writes the five files itself.

Providers (API mode)

One OpenAI-compatible adapter (switched by base_url) plus dedicated Anthropic and Gemini adapters. Selection precedence: --provider flag → BOOK_TO_SKILL_PROVIDER env → error.

Provider --provider Base URL Env var Default model
OpenAI openai https://api.openai.com/v1 OPENAI_API_KEY gpt-4o-mini
OpenRouter openrouter https://openrouter.ai/api/v1 OPENROUTER_API_KEY openai/gpt-4o-mini
Groq groq https://api.groq.com/openai/v1 GROQ_API_KEY openai/gpt-oss-20b
xAI Grok grok https://api.x.ai/v1 XAI_API_KEY grok-2-latest
DeepSeek deepseek https://api.deepseek.com/v1 DEEPSEEK_API_KEY deepseek-chat
Qwen / DashScope qwen https://dashscope-intl.aliyuncs.com/compatible-mode/v1 DASHSCOPE_API_KEY qwen-plus
Ollama (local) ollama http://localhost:11434/v1 (none) llama3.2
Local / custom local --base-url LOCAL_API_KEY (optional) --model
Anthropic (Claude) anthropic https://api.anthropic.com/v1/messages ANTHROPIC_API_KEY claude-sonnet-4-6
Google Gemini gemini https://generativelanguage.googleapis.com/v1beta GEMINI_API_KEY gemini-2.5-flash

Gemini also exposes an OpenAI-compatible endpoint (https://generativelanguage.googleapis.com/v1beta/openai/); this project ships a dedicated native adapter but you can also drive Gemini through local with that base URL.

Pricing/model IDs drift. Cost estimates use a config-driven table in src/book_to_skill/config.py (MODEL_PRICES). Verify current pricing at https://docs.claude.com/en/docs/about-claude/pricing and each provider's pricing page.

Keys are read from environment variables only and are never logged or committed. Copy .env.example to .env and fill in what you use.

Development

ruff check .            # lint
ruff format --check .   # format check
pytest -q --cov=book_to_skill --cov-report=term-missing

All tests are offline: synthetic fixtures are built at runtime (including a stdlib-only DOCX and a pypdf-built PDF), provider HTTP is mocked, and the end-to-end test uses a hidden stub provider. CI runs ruff + pytest on Python 3.11 and 3.12.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naoh-0.1.1.tar.gz (86.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

naoh-0.1.1-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file naoh-0.1.1.tar.gz.

File metadata

  • Download URL: naoh-0.1.1.tar.gz
  • Upload date:
  • Size: 86.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naoh-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ddce799c0f0870391c086cfb2e15284a6ec5b46b4289af608a8c0f1a5981c7f5
MD5 d384966d84003796b744399fd7e55f5a
BLAKE2b-256 cbb28dca85b3a2ada706c70882ec51773bf6819b10fc9fdc177430a94dc9dd74

See more details on using hashes here.

Provenance

The following attestation bundles were made for naoh-0.1.1.tar.gz:

Publisher: workflow.yml on vinothhacks/naoh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naoh-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: naoh-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naoh-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 06d3797e7e56f492b77a93043c5784eb8b6cc08817968ed5a7727dff5139ff1a
MD5 135856edc550c695f5da35ae2cb639f3
BLAKE2b-256 9077d343dbf1c2f2d6b38af497634befc1c96c094755a5662474981d52e3c700

See more details on using hashes here.

Provenance

The following attestation bundles were made for naoh-0.1.1-py3-none-any.whl:

Publisher: workflow.yml on vinothhacks/naoh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page