naoh / book-to-skill: turn a PDF/MD/DOCX book into a Claude Code skill. Skill mode (no API) + provider-agnostic API mode.
Project description
naoh — book-to-skill
Turn a technical book or document (.pdf, .md/.markdown, .docx) into a
Claude Code skill: a front-loaded SKILL.md
plus on-demand chapter summaries, a glossary, a patterns file, and a cheatsheet.
Two modes, one core
Extraction + structure detection are shared. What differs is who writes the summaries:
| Mode | API key? | Who synthesizes | How to trigger |
|---|---|---|---|
| Skill mode (default) | No | The Claude Code agent, by following SKILL.md |
install the skill, then /book-to-skill <file> [slug] |
| API mode | Yes | A configurable LLM provider, in our Python code | book-to-skill build <file> [slug] --provider <name> |
Output (same in both modes)
~/.claude/skills/<slug>/
├── SKILL.md # core mental models + chapter index + topic index (front-loaded)
├── chapters/chNN-*.md # one dense summary per chapter (loaded on demand)
├── glossary.md # key terms, alphabetical, with chapter refs
├── patterns.md # techniques / algorithms / design patterns
└── cheatsheet.md # decision tables + quick rules
Supported input formats
Only three, by design — anything else is rejected with a clear message:
- PDF (
.pdf) —pdftotext -layout(poppler) →pypdf→pdfminer.six(first non-empty wins). - Markdown (
.md,.markdown) — read as UTF-8 (BOM-safe); headings drive structure detection. - DOCX (
.docx) —python-docx, with a zero-dependency stdlibzipfileXML fallback.
Routing is by file suffix, with a magic-byte sniff fallback (%PDF, ZIP→docx) when the suffix is
missing or wrong. If a backend is missing, the tool prints the exact install commands and exits
non-zero (it never fails silently).
Install
pip install naoh # from PyPI
The PyPI distribution is naoh; it installs the book_to_skill import package and the
book-to-skill CLI command.
From a clone (for development):
pip install -e ".[dev]" # with dev/test extras
# or, for runtime only:
pip install -e .
Python 3.11+ required. PDF live extraction works best with poppler (pdftotext); without it the
tool falls back to pypdf / pdfminer.six.
Usage
# 1) Extract only — writes full_text.txt + metadata.json, prints their paths (used by skill mode)
book-to-skill extract path/to/book.pdf
# 2) Analyze — extraction + structure report, no files written
book-to-skill analyze path/to/book.pdf
# 3) Build (API mode) — generate the whole skill with a provider
book-to-skill build path/to/book.pdf my-slug --provider groq --model openai/gpt-oss-20b
Useful flags for build: --style {technical,prose,auto}, --base-url (for local/custom),
--analyze-only, --skills-root <dir>, --yes (skip the confirmation prompt).
Skill mode (no API key)
Install the skill, then invoke it from Claude Code:
/book-to-skill ~/Documents/some-book.pdf clean-architecture
The agent runs scripts/extract.py, reads the extracted text, and writes the five files itself.
Providers (API mode)
One OpenAI-compatible adapter (switched by base_url) plus dedicated Anthropic and Gemini adapters.
Selection precedence: --provider flag → BOOK_TO_SKILL_PROVIDER env → error.
| Provider | --provider |
Base URL | Env var | Default model |
|---|---|---|---|---|
| OpenAI | openai |
https://api.openai.com/v1 |
OPENAI_API_KEY |
gpt-4o-mini |
| OpenRouter | openrouter |
https://openrouter.ai/api/v1 |
OPENROUTER_API_KEY |
openai/gpt-4o-mini |
| Groq | groq |
https://api.groq.com/openai/v1 |
GROQ_API_KEY |
openai/gpt-oss-20b |
| xAI Grok | grok |
https://api.x.ai/v1 |
XAI_API_KEY |
grok-2-latest |
| DeepSeek | deepseek |
https://api.deepseek.com/v1 |
DEEPSEEK_API_KEY |
deepseek-chat |
| Qwen / DashScope | qwen |
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
DASHSCOPE_API_KEY |
qwen-plus |
| Ollama (local) | ollama |
http://localhost:11434/v1 |
(none) | llama3.2 |
| Local / custom | local |
--base-url |
LOCAL_API_KEY (optional) |
--model |
| Anthropic (Claude) | anthropic |
https://api.anthropic.com/v1/messages |
ANTHROPIC_API_KEY |
claude-sonnet-4-6 |
| Google Gemini | gemini |
https://generativelanguage.googleapis.com/v1beta |
GEMINI_API_KEY |
gemini-2.5-flash |
Gemini also exposes an OpenAI-compatible endpoint
(https://generativelanguage.googleapis.com/v1beta/openai/); this project ships a dedicated native
adapter but you can also drive Gemini through local with that base URL.
Pricing/model IDs drift. Cost estimates use a config-driven table in
src/book_to_skill/config.py(MODEL_PRICES). Verify current pricing at https://docs.claude.com/en/docs/about-claude/pricing and each provider's pricing page.
Keys are read from environment variables only and are never logged or committed. Copy
.env.example to .env and fill in what you use.
Development
ruff check . # lint
ruff format --check . # format check
pytest -q --cov=book_to_skill --cov-report=term-missing
All tests are offline: synthetic fixtures are built at runtime (including a stdlib-only DOCX and a
pypdf-built PDF), provider HTTP is mocked, and the end-to-end test uses a hidden stub provider.
CI runs ruff + pytest on Python 3.11 and 3.12.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file naoh-0.1.1.tar.gz.
File metadata
- Download URL: naoh-0.1.1.tar.gz
- Upload date:
- Size: 86.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddce799c0f0870391c086cfb2e15284a6ec5b46b4289af608a8c0f1a5981c7f5
|
|
| MD5 |
d384966d84003796b744399fd7e55f5a
|
|
| BLAKE2b-256 |
cbb28dca85b3a2ada706c70882ec51773bf6819b10fc9fdc177430a94dc9dd74
|
Provenance
The following attestation bundles were made for naoh-0.1.1.tar.gz:
Publisher:
workflow.yml on vinothhacks/naoh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
naoh-0.1.1.tar.gz -
Subject digest:
ddce799c0f0870391c086cfb2e15284a6ec5b46b4289af608a8c0f1a5981c7f5 - Sigstore transparency entry: 1951300364
- Sigstore integration time:
-
Permalink:
vinothhacks/naoh@2448d81618aa3718a89e65bca2a4992af35cd176 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/vinothhacks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@2448d81618aa3718a89e65bca2a4992af35cd176 -
Trigger Event:
push
-
Statement type:
File details
Details for the file naoh-0.1.1-py3-none-any.whl.
File metadata
- Download URL: naoh-0.1.1-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06d3797e7e56f492b77a93043c5784eb8b6cc08817968ed5a7727dff5139ff1a
|
|
| MD5 |
135856edc550c695f5da35ae2cb639f3
|
|
| BLAKE2b-256 |
9077d343dbf1c2f2d6b38af497634befc1c96c094755a5662474981d52e3c700
|
Provenance
The following attestation bundles were made for naoh-0.1.1-py3-none-any.whl:
Publisher:
workflow.yml on vinothhacks/naoh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
naoh-0.1.1-py3-none-any.whl -
Subject digest:
06d3797e7e56f492b77a93043c5784eb8b6cc08817968ed5a7727dff5139ff1a - Sigstore transparency entry: 1951300511
- Sigstore integration time:
-
Permalink:
vinothhacks/naoh@2448d81618aa3718a89e65bca2a4992af35cd176 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/vinothhacks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@2448d81618aa3718a89e65bca2a4992af35cd176 -
Trigger Event:
push
-
Statement type: