A Python tool for extracting table of contents from EPUB files with hierarchical structure support

These details have not been verified by PyPI

Project links

Project description

EPUB TOC

A Python tool for extracting table of contents from EPUB files with hierarchical structure support.

Features

Multiple extraction methods support (NCX, epub_meta, OPF)
Automatic best method selection
Hierarchical TOC structure preservation
Russian and English language support
JSON output format
Detailed logging
EPUB file analysis reports

Installation

pip install epub_toc

Usage

As a module

from epub_toc import EPUBTOCParser

# Create parser
parser = EPUBTOCParser('path/to/book.epub')

# Extract TOC
toc = parser.extract_toc()

# Print to console
parser.print_toc()

# Save to JSON
parser.save_toc_to_json('output.json')

From command line

epub-toc path/to/book.epub

EPUB File Analysis

To analyze all EPUB files in tests/data/epub_samples directory:

python tests/integration/test_epub_analysis.py

Analysis results are saved in reports/ directory:

epub_analysis_YYYYMMDD_HHMMSS.json - detailed report in JSON format
epub_analysis_YYYYMMDD_HHMMSS.txt - brief report in text format
toc/*.json - extracted TOCs for each EPUB file

Report structure:

JSON report contains:
- Overall statistics for all files
- Extraction methods success rate
- Detailed results for each file
- Links to extracted TOC files
Text report includes:
- Brief statistics
- Information about each file
- Paths to extracted TOCs
TOC files:
- Saved in toc/ subdirectory
- Named as book_name_toc.json
- Contain complete TOC in JSON format

Output Format

TOC is saved in JSON format with the following structure:

{
  "metadata": {
    "title": "Book Title",
    "authors": ["Author 1", "Author 2"],
    "publisher": "Publisher Name",
    "publication_date": "2024-01-01",
    "language": "en",
    "description": "Book description",
    "cover_image_path": "path/to/cover.jpg",
    "isbn": "978-3-16-148410-0",
    "rights": "Copyright information",
    "series": "Series Name",
    "series_index": 1,
    "identifiers": {
      "isbn13": "978-3-16-148410-0",
      "uuid": "550e8400-e29b-41d4-a716-446655440000"
    },
    "subjects": ["Fiction", "Adventure"],
    "file_size": 1234567,
    "file_name": "book.epub"
  },
  "toc": [
    {
      "title": "Chapter 1",
      "href": "chapter1.html",
      "level": 0,
      "children": [
        {
          "title": "Section 1.1",
          "href": "chapter1.html#section1",
          "level": 1,
          "children": []
        }
      ]
    }
  ]
}

All metadata fields are optional and will be omitted if not available in the EPUB file.

Testing

The module has been successfully tested on various EPUB files:

Russian books (NCX method)
English books (epub_meta method)
Files with different TOC structures
Files of different sizes (from 400KB to 8MB)

Requirements

Python 3.7+
epub_meta>=0.0.7
lxml>=4.9.3
beautifulsoup4>=4.12.2

Contributing

We welcome contributions! If you'd like to help:

Fork the repository
Create a branch for your changes
Make changes and add tests
Ensure all tests pass
Create a Pull Request

See CONTRIBUTING.md for details.

Security

If you discover a security vulnerability, please DO NOT create a public issue. Instead, send a report following the instructions in SECURITY.md

License

This project is licensed under the MIT License. See LICENSE file for details.

Roadmap

Additional EPUB format support
Improved complex hierarchical structure handling
Integration with popular e-readers
Web service API
Additional language support

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Dec 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epub_toc-1.0.0.tar.gz (14.8 kB view details)

Uploaded Dec 26, 2024 Source

File details

Details for the file epub_toc-1.0.0.tar.gz.

File metadata

Download URL: epub_toc-1.0.0.tar.gz
Upload date: Dec 26, 2024
Size: 14.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.8.10

File hashes

Hashes for epub_toc-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a39a3109b8c4f6120e0c11ec794cc7f72018f0c957a6d09ed77daaf0b3cf0f17`
MD5	`d6c9bc5e7364326198d633866c92d196`
BLAKE2b-256	`2ef8bfcc5667925c9d39d0320757bece1b5a10402c7a8ed0db41a26ca1e9b23a`

See more details on using hashes here.

epub-toc 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

EPUB TOC

Features

Installation

Usage

As a module

From command line

EPUB File Analysis

Output Format

Testing

Requirements

Contributing

Security

License

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes