A lightweight, extensible tool for managing eBook metadata
Project description
ebk
ebk is a powerful eBook metadata management tool with a SQLAlchemy + SQLite database backend. It provides a comprehensive fluent API for programmatic use, a rich Typer-based CLI (with colorized output courtesy of Rich), full-text search with FTS5 indexing, automatic text extraction and chunking for semantic search, hash-based file deduplication, and optional AI-powered features including knowledge graphs and semantic search.
Table of Contents
- Features
- Installation
- Quick Start
- Configuration
- CLI Usage
- Python API
- Integrations
- Architecture
- Development
- Contributing
- License
- Documentation
- Stay Updated
- Support
Features
- SQLAlchemy + SQLite Backend: Robust database with normalized schema, proper relationships, and FTS5 full-text search
- Fluent Python API: Comprehensive programmatic interface with method chaining and query builders
- Typer + Rich CLI: A colorized, easy-to-use command-line interface
- Automatic Text Extraction: Extract and index text from PDFs, EPUBs, and plaintext files
- PyMuPDF (primary) with pypdf fallback for PDFs
- ebooklib with HTML parsing for EPUBs
- Automatic chunking (500-word overlapping chunks) for semantic search
- Hash-based Deduplication: SHA256-based file deduplication
- Same file (same hash) = skipped
- Same book, different format = added as additional format
- Hash-prefixed directory storage for scalability
- Advanced Search: Powerful search with field-specific queries and boolean logic
- Field searches:
title:Python,author:Knuth,tag:programming - Boolean operators:
AND(implicit),OR,NOT/-prefix - Comparison filters:
rating:>=4,rating:3-5 - Exact filters:
language:en,format:pdf,favorite:true - Phrase searches:
"machine learning" - Fast FTS5-powered full-text search across titles, descriptions, and extracted text
- Field searches:
- Import from Multiple Sources:
- Calibre libraries (reads metadata.opf files)
- Individual ebook files with auto-metadata extraction
- Batch import with progress tracking
- Cover Extraction: Automatic cover extraction and thumbnail generation
- PDFs: First page rendered as image
- EPUBs: Cover from metadata or naming patterns
- AI-Powered Features (optional):
- LLM Provider Abstraction: Support for multiple LLM backends (Ollama, OpenAI-compatible APIs)
- Metadata Enrichment: Auto-generate tags, categories, and enhanced descriptions using LLMs
- Local & Remote LLM: Connect to local Ollama or remote GPU servers
- Knowledge Graph: NetworkX-based concept extraction and relationship mapping
- Semantic Search: Vector embeddings for similarity search (with TF-IDF fallback)
- Reading Companion: Track reading sessions with timestamps
- Question Generator: Generate active recall questions
- Web Server Interface:
- FastAPI-based REST API for library management
- URL-based navigation with filters, pagination, and sorting
- Clickable covers and file formats to open books
- Book details modal with comprehensive metadata display
- Flexible Exports:
- HTML Export: Self-contained interactive catalog with pagination (50 books/page)
- Client-side search and filtering
- URL state tracking for bookmarkable pages
- Optional file copying with
--copyflag (includes covers)
- Export to ZIP archives
- Hugo-compatible Markdown with multiple organization options
- Jinja2 template support for customizable export formats
- HTML Export: Self-contained interactive catalog with pagination (50 books/page)
- Integrations (optional):
- Streamlit Dashboard: Interactive web interface
- MCP Server: AI assistant integration
- Visualizations: Network graphs for analysis
Installation
Basic Installation
pip install ebk
From Source
git clone https://github.com/queelius/ebk.git
cd ebk
pip install .
With Optional Features
# With Streamlit dashboard
pip install ebk[streamlit]
# With visualization tools
pip install ebk[viz]
# With all optional features
pip install ebk[all]
# For development
pip install ebk[dev]
Note: Requires Python 3.10+
Quick Start
1. Initialize Configuration
# Create default configuration file at ~/.config/ebk/config.json
ebk config init
# View current configuration
ebk config show
# Set default library path
ebk config set library.default_path ~/my-library
2. Create and Populate Library
# Initialize a new library
ebk init ~/my-library
# Import a single ebook with auto-metadata extraction
ebk import book.pdf ~/my-library
# Import from Calibre library
ebk import-calibre ~/Calibre/Library --output ~/my-library
# Search using full-text search
ebk search "python programming" ~/my-library
# List books with filtering
ebk list ~/my-library --author "Knuth" --limit 20
# Get statistics
ebk stats ~/my-library
3. Launch Web Interface
# Start web server (uses config defaults)
ebk serve ~/my-library
# Custom port and host
ebk serve ~/my-library --port 8080 --host 127.0.0.1
# Auto-open browser
ebk config set server.auto_open_browser true
ebk serve ~/my-library
4. AI-Powered Metadata Enrichment
# Configure LLM provider
ebk config set llm.provider ollama
ebk config set llm.model llama3.2
ebk config set llm.host localhost
# Enrich library metadata using LLM
ebk enrich ~/my-library
# Enrich with all features
ebk enrich ~/my-library --generate-tags --categorize --enhance-descriptions
# Use remote GPU server
ebk enrich ~/my-library --host 192.168.1.100
Configuration
ebk uses a centralized configuration system stored at ~/.config/ebk/config.json. This configuration file manages settings for LLM providers, web server, CLI defaults, and library preferences.
Configuration File Structure
{
"llm": {
"provider": "ollama",
"model": "llama3.2",
"host": "localhost",
"port": 11434,
"api_key": null,
"temperature": 0.7,
"max_tokens": null
},
"server": {
"host": "0.0.0.0",
"port": 8000,
"auto_open_browser": false,
"page_size": 50
},
"cli": {
"verbose": false,
"color": true,
"page_size": 50
},
"library": {
"default_path": null
}
}
Configuration Management
# Initialize configuration (creates default config file)
ebk config init
# View current configuration
ebk config show
# Edit configuration in your default editor
ebk config edit
# Set specific values
ebk config set llm.provider ollama
ebk config set llm.model mistral
ebk config set server.port 8080
ebk config set library.default_path ~/my-library
# Get specific value
ebk config get llm.model
LLM Provider Configuration
Configure LLM providers for metadata enrichment:
# Local Ollama (default)
ebk config set llm.provider ollama
ebk config set llm.host localhost
ebk config set llm.port 11434
ebk config set llm.model llama3.2
# Remote GPU server
ebk config set llm.host 192.168.1.100
# OpenAI-compatible API (future)
ebk config set llm.provider openai
ebk config set llm.api_key sk-...
ebk config set llm.model gpt-4
CLI Overrides
All commands support CLI arguments that override configuration defaults:
# These override config settings
ebk serve ~/library --port 9000 --host 127.0.0.1
ebk enrich ~/library --host 192.168.1.50 --model mistral
CLI Usage
ebk uses Typer with Rich for a beautiful, colorized CLI experience.
General CLI Structure
ebk --help # See all available commands
ebk <command> --help # See specific command usage
ebk --verbose <command> # Enable verbose output
Database Commands
Core library management with SQLAlchemy + SQLite backend:
# Initialize library
ebk init ~/my-library
# Import books
ebk import book.pdf ~/my-library
ebk import ~/books/*.epub ~/my-library
ebk import-calibre ~/Calibre/Library --output ~/my-library
# Search with advanced syntax
ebk search "machine learning" ~/my-library # Plain full-text search
ebk search "title:Python rating:>=4" ~/my-library # Field-specific with filters
ebk search "author:Knuth format:pdf" ~/my-library # Multiple criteria
ebk search "tag:programming NOT java" ~/my-library # Boolean operators
ebk search '"deep learning" language:en' ~/my-library # Phrase search with filter
# List and filter
ebk list ~/my-library
ebk list ~/my-library --author "Knuth" --language en --limit 20
ebk list ~/my-library --format pdf --rating 4
# Statistics
ebk stats ~/my-library
ebk stats ~/my-library --format json
# Manage reading status
ebk rate ~/my-library <book-id> 5
ebk favorite ~/my-library <book-id>
ebk tag ~/my-library <book-id> --add "must-read" "technical"
# Remove books
ebk purge ~/my-library --rating 1 --confirm
Web Server
Launch FastAPI-based web interface:
# Start server (uses config defaults)
ebk serve ~/my-library
# Custom host and port
ebk serve ~/my-library --host 127.0.0.1 --port 8080
# Auto-open browser
ebk serve ~/my-library --auto-open
# Configure defaults in config
ebk config set server.port 8080
ebk config set server.auto_open_browser true
AI-Powered Features
Enrich metadata using LLMs:
# Basic enrichment (uses config settings)
ebk enrich ~/my-library
# Full enrichment
ebk enrich ~/my-library \
--generate-tags \
--categorize \
--enhance-descriptions \
--assess-difficulty
# Enrich specific book
ebk enrich ~/my-library --book-id 42
# Use remote GPU server
ebk enrich ~/my-library --host 192.168.1.100 --model mistral
# Dry run (preview changes without saving)
ebk enrich ~/my-library --dry-run
Configuration Management
Manage global configuration:
# Initialize configuration
ebk config init
# View configuration
ebk config show
ebk config show --section llm
# Edit in default editor
ebk config edit
# Set values
ebk config set llm.model llama3.2
ebk config set server.port 8080
ebk config set library.default_path ~/books
# Get values
ebk config get llm.model
Export and Advanced Features
# Export library
ebk export html ~/my-library ~/library.html # Self-contained HTML with pagination
ebk export html ~/my-library ~/site/lib.html --copy --base-url /library # Copy files + covers
ebk export zip ~/my-library ~/backup.zip
ebk export json ~/my-library ~/metadata.json
# Virtual libraries (filtered views)
ebk vlib create ~/my-library "python-books" --subject Python
ebk vlib list ~/my-library
# Notes and annotations
ebk note add ~/my-library <book-id> "Great chapter on algorithms"
ebk note list ~/my-library <book-id>
Documentation
Comprehensive documentation is available at: https://queelius.github.io/ebk/
Documentation Contents
-
Getting Started
-
User Guide
-
Advanced Topics
-
Development
Python API
ebk provides a comprehensive SQLAlchemy-based API for programmatic library management:
from pathlib import Path
from ebk.library_db import Library
# Open or create a library
lib = Library.open(Path("~/my-library"))
# Import books with automatic metadata extraction
book = lib.add_book(
Path("book.pdf"),
metadata={"title": "My Book", "creators": ["Author Name"]},
extract_text=True,
extract_cover=True
)
# Fluent query interface
results = (lib.query()
.filter_by_language("en")
.filter_by_author("Knuth")
.filter_by_subject("Algorithms")
.order_by("title", desc=False)
.limit(20)
.all())
# Full-text search (FTS5)
results = lib.search("machine learning", limit=50)
# Get book by ID
book = lib.get_book(42)
print(f"{book.title} by {', '.join([a.name for a in book.authors])}")
# Update reading status
lib.update_reading_status(book.id, "reading", progress=50, rating=4)
# Add tags
lib.add_tags(book.id, ["must-read", "technical"])
# Get statistics
stats = lib.stats()
print(f"Total books: {stats['total_books']}")
print(f"Total authors: {stats['total_authors']}")
print(f"Languages: {', '.join(stats['languages'])}")
# Query with filters
from ebk.db.models import Book, Author
from sqlalchemy import and_
books = lib.session.query(Book).join(Book.authors).filter(
and_(
Author.name.like("%Knuth%"),
Book.language == "en"
)
).all()
# Always close when done
lib.close()
# Or use context manager
with Library.open(Path("~/my-library")) as lib:
results = lib.search("Python programming")
for book in results:
print(book.title)
AI-Powered Metadata Enrichment
from ebk.ai.llm_providers.ollama import OllamaProvider
from ebk.ai.metadata_enrichment import MetadataEnrichmentService
# Initialize provider (local or remote)
provider = OllamaProvider.remote(
host="192.168.1.100",
model="llama3.2"
)
service = MetadataEnrichmentService(provider)
async with provider:
# Generate tags
tags = await service.generate_tags(
title="Introduction to Algorithms",
authors=["Cormen", "Leiserson"],
description="Comprehensive algorithms textbook"
)
# Categorize
categories = await service.categorize(
title="Introduction to Algorithms",
subjects=["Algorithms", "Data Structures"]
)
# Enhance description
description = await service.enhance_description(
title="Introduction to Algorithms",
text_sample="Chapter 1: The Role of Algorithms..."
)
See the CLAUDE.md file for architectural details and API documentation for complete reference.
Contributing
Contributions are welcome! Here’s how to get involved:
- Fork the Repo
- Create a Branch for your feature or fix
- Commit & Push your changes
- Open a Pull Request describing the changes
We appreciate code contributions, bug reports, and doc improvements alike.
License
Distributed under the MIT License.
Integrations
ebk follows a modular architecture where the core library remains lightweight, with optional integrations available:
Streamlit Dashboard
pip install ebk[streamlit]
streamlit run ebk/integrations/streamlit/app.py
MCP Server (AI Assistants)
pip install ebk[mcp]
# Configure your AI assistant to use the MCP server
Visualizations
pip install ebk[viz]
# Visualization tools will be available as a separate script
# Documentation coming soon in integrations/viz/
See the Integrations Guide for detailed setup instructions.
Architecture
ebk is designed with a clean, layered architecture:
- Core Library (
ebk.library): Fluent API for all operations - CLI (
ebk.cli): Typer-based commands using the fluent API - Import/Export (
ebk.imports,ebk.exports): Modular format support - Integrations (
integrations/): Optional add-ons (web UI, AI, viz)
This design ensures the core remains lightweight while supporting powerful extensions.
Development
# Clone the repository
git clone https://github.com/queelius/ebk.git
cd ebk
# Create virtual environment
make venv
# Install in development mode
make setup
# Run tests
make test
# Check coverage
make coverage
Stay Updated
- GitHub: https://github.com/queelius/ebk
- Website: https://metafunctor.com
Support
- Issues: Open an Issue on GitHub
- Contact: lex@metafunctor.com
Happy eBook managing! 📚✨
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ebk-0.4.4.tar.gz.
File metadata
- Download URL: ebk-0.4.4.tar.gz
- Upload date:
- Size: 364.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49c5560155cab2d72ce58f53d3d57ba3515f2a604b1efa96faa227394ae142a1
|
|
| MD5 |
e762a06165e19e4aa4b16c802b17f50f
|
|
| BLAKE2b-256 |
46fe30196fecc872006259d9dd46a55169cf2f9a0aafd2fdde304899968eb631
|
File details
Details for the file ebk-0.4.4-py3-none-any.whl.
File metadata
- Download URL: ebk-0.4.4-py3-none-any.whl
- Upload date:
- Size: 281.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
addc8a96c8cedf381583398acbc70bec5f8c93c9999bf105d8830946f9c4398d
|
|
| MD5 |
9c5d1dae7ab5f37317c46815748ed800
|
|
| BLAKE2b-256 |
8d7c83b6fddb2c079592199832fb061b50106e888ad8718e5e90ee3c1bde6e37
|