Skip to main content

naoh / book-to-skill: turn a PDF/MD/DOCX book into a Claude Code skill. Skill mode (no API) + provider-agnostic API mode.

Project description

naoh — book-to-skill

CI License: MIT Python 3.11+

Turn a technical book or document (.pdf, .md/.markdown, .docx) into a Claude Code skill: a front-loaded SKILL.md plus on-demand chapter summaries, a glossary, a patterns file, and a cheatsheet.

naoh is an enhanced reimplementation of the public reference virgiliojr94/book-to-skill. The package and CLI keep the name book-to-skill; the project/brand is naoh.

Two modes, one core

Extraction + structure detection are shared. What differs is who writes the summaries:

Mode API key? Who synthesizes How to trigger
Skill mode (default) No The Claude Code agent, by following SKILL.md install the skill, then /book-to-skill <file> [slug]
API mode Yes A configurable LLM provider, in our Python code book-to-skill build <file> [slug] --provider <name>

Output (same in both modes)

~/.claude/skills/<slug>/
├── SKILL.md            # core mental models + chapter index + topic index (front-loaded)
├── chapters/chNN-*.md  # one dense summary per chapter (loaded on demand)
├── glossary.md         # key terms, alphabetical, with chapter refs
├── patterns.md         # techniques / algorithms / design patterns
└── cheatsheet.md       # decision tables + quick rules

Supported input formats

Only three, by design — anything else is rejected with a clear message:

  • PDF (.pdf) — pdftotext -layout (poppler) → pypdfpdfminer.six (first non-empty wins).
  • Markdown (.md, .markdown) — read as UTF-8 (BOM-safe); headings drive structure detection.
  • DOCX (.docx) — python-docx, with a zero-dependency stdlib zipfile XML fallback.

Routing is by file suffix, with a magic-byte sniff fallback (%PDF, ZIP→docx) when the suffix is missing or wrong. If a backend is missing, the tool prints the exact install commands and exits non-zero (it never fails silently).

Install

pip install naoh            # from PyPI

The PyPI distribution is naoh; it installs the book_to_skill import package and the book-to-skill CLI command.

From a clone (for development):

pip install -e ".[dev]"     # with dev/test extras
# or, for runtime only:
pip install -e .

Python 3.11+ required. PDF live extraction works best with poppler (pdftotext); without it the tool falls back to pypdf / pdfminer.six.

Usage

# 1) Extract only — writes full_text.txt + metadata.json, prints their paths (used by skill mode)
book-to-skill extract path/to/book.pdf

# 2) Analyze — extraction + structure report, no files written
book-to-skill analyze path/to/book.pdf

# 3) Build (API mode) — generate the whole skill with a provider
book-to-skill build path/to/book.pdf my-slug --provider groq --model openai/gpt-oss-20b

Useful flags for build: --style {technical,prose,auto}, --base-url (for local/custom), --analyze-only, --skills-root <dir>, --yes (skip the confirmation prompt).

Skill mode (no API key)

Install the skill, then invoke it from Claude Code:

/book-to-skill ~/Documents/some-book.pdf clean-architecture

The agent runs scripts/extract.py, reads the extracted text, and writes the five files itself.

Providers (API mode)

One OpenAI-compatible adapter (switched by base_url) plus dedicated Anthropic and Gemini adapters. Selection precedence: --provider flag → BOOK_TO_SKILL_PROVIDER env → error.

Provider --provider Base URL Env var Default model
OpenAI openai https://api.openai.com/v1 OPENAI_API_KEY gpt-4o-mini
OpenRouter openrouter https://openrouter.ai/api/v1 OPENROUTER_API_KEY openai/gpt-4o-mini
Groq groq https://api.groq.com/openai/v1 GROQ_API_KEY openai/gpt-oss-20b
xAI Grok grok https://api.x.ai/v1 XAI_API_KEY grok-2-latest
DeepSeek deepseek https://api.deepseek.com/v1 DEEPSEEK_API_KEY deepseek-chat
Qwen / DashScope qwen https://dashscope-intl.aliyuncs.com/compatible-mode/v1 DASHSCOPE_API_KEY qwen-plus
Ollama (local) ollama http://localhost:11434/v1 (none) llama3.2
Local / custom local --base-url LOCAL_API_KEY (optional) --model
Anthropic (Claude) anthropic https://api.anthropic.com/v1/messages ANTHROPIC_API_KEY claude-sonnet-4-6
Google Gemini gemini https://generativelanguage.googleapis.com/v1beta GEMINI_API_KEY gemini-2.5-flash

Gemini also exposes an OpenAI-compatible endpoint (https://generativelanguage.googleapis.com/v1beta/openai/); this project ships a dedicated native adapter but you can also drive Gemini through local with that base URL.

Pricing/model IDs drift. Cost estimates use a config-driven table in src/book_to_skill/config.py (MODEL_PRICES). Verify current pricing at https://docs.claude.com/en/docs/about-claude/pricing and each provider's pricing page.

Keys are read from environment variables only and are never logged or committed. Copy .env.example to .env and fill in what you use.

Development

ruff check .            # lint
ruff format --check .   # format check
pytest -q --cov=book_to_skill --cov-report=term-missing

All tests are offline: synthetic fixtures are built at runtime (including a stdlib-only DOCX and a pypdf-built PDF), provider HTTP is mocked, and the end-to-end test uses a hidden stub provider. CI runs ruff + pytest on Python 3.11 and 3.12.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naoh-0.1.0.tar.gz (86.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

naoh-0.1.0-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file naoh-0.1.0.tar.gz.

File metadata

  • Download URL: naoh-0.1.0.tar.gz
  • Upload date:
  • Size: 86.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naoh-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7c5419f26526cc144e8f397cf064c2dc681922682eaf123d8e13834ff19cc3d5
MD5 179287484585e63a7913b6190fcd8155
BLAKE2b-256 60640af16a2675a0dc6c9c2d10d2e4323a40fb2d6cb7fad4d22e83615b1d012b

See more details on using hashes here.

Provenance

The following attestation bundles were made for naoh-0.1.0.tar.gz:

Publisher: workflow.yml on vinothhacks/naoh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naoh-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: naoh-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naoh-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9eee56fe444e0c229140ff49da2e1cec45e64b01dbc073a3b9fa5ea120f8be69
MD5 7f1bdc10c188cd2c14378a6b4ea19858
BLAKE2b-256 d00e8e2c4649934305a68d726417b379e784ec2e1ef1fdaaae7cc030fe0cdd2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for naoh-0.1.0-py3-none-any.whl:

Publisher: workflow.yml on vinothhacks/naoh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page