Skip to main content

Comic book archive multi format metadata read/write/transform tool and image extractor.

Project description

Comicbox

A comic book archive metadata reader and writer.

Features

📚Comic Formats

Comicbox reads CBZ, CBR, CBT, and optionally PDF. Comicbox archives and writes CBZ archives and PDF metadata.

🏷️ Metadata Formats

Comicbox reads and writes:

Usefulness

Comicbox's primary purpose is a library for use by Codex comic reader. The API isn't well documented, but you can infer what it does pretty easily here: comicbox.comic_archive as the primary interface.

The command line can perform most of comicbox's functions including reading and writing metadata recursively, converting between metadata formats and extracting pages.

Limitations and Alternatives

Comicbox does not use popular metadata database APIs or have a GUI!

Comictagger probably the most useful comicbook tagger. It does most of what Comicbox does but also automatically tags comics with the ComicVine API and has a desktop UI.

📦 Installation

pip install comicbox

Comicbox supports PDFs as an extra when installed like:

pip install comicbox[pdf]

Dependencies

Base

Comicbox generally works without any binary dependencies but requires unrar be on the path to convert CBR into CBZ or extract files from CBRs.

PDF

The pymupdf dependency has wheels that install a local version of libmupdf. But for some platforms (e.g. Linux on ARM, Windows) it may require libstdc++ and c/c++ build tools installed to compile a libmupdf. More detail on this is available in the pymupdf docs.

Installing Comicbox on ARM (AARCH64) with Python 3.13

Pymupdf has no pre-built wheels for AARCH64 so pip must build it and the build fails on Python 3.13 without this environment variable set:

PYMUPDF_SETUP_PY_LIMITED_API=0 pip install comicbox

You will also have to have the build-essential and python3-dev or equivalent packages installed on on your Linux.

⌨️ Usage

Related Projects

Comicbox makes use of two of my other small projects:

comicfn2dict which parses metadata in comic filenames into python dicts. This library is also used by Comictagger.

pdffile which presents a ZipFile like interface for PDF files.

Console

Type

comicbox -h

see the CLI help.

Examples

comicbox test.cbz -m "{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}" -m "Publisher: SmallComics" -w cr

Will write those tags to comicinfo.xml in the archive.

Be sure to add spaces after colons so they parse as valid YAML key value pairs. This is easy to forget.

But it's probably better to use the --print action to see what it's going to do before you actually write to the archive:

comicbox test.cbz -m "{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}" -m "Publisher: SmallComics" -p

A recursive example:

comicbox --recurse -m "publisher: 'SC Comics'" -w cr ./SmallComicsComics/

Will recursively change the publisher to "SC Comics" for every comic found in under the SmallComicsComics directory.

Escaping YAML

the -m command line argument accepts the YAML language for tags. Certain characters like \,:;_()$%^@ are part of the YAML language. To successful include them as data in your tags, look up "Escaping YAML" documentation online

Deleting Metadata

To delete metadata from the cli you're best off exporting the current metadata, editing the file and then re-importing it with the delete previous metadata option:

# export the current metadata
comicbox --export cix "My Overtagged Comic.cbz"
# Adjust the metadata in an editor.
nvim comicinfo.xml
# Check that importing the metadata will look how you like
comicbox --import comicinfo.xml -p "My Overtagged Comic.cbz"
# Delete all previous metadata from the comic (careful!)
comicbox --delete-all-tags "My Overtagged Comic.cbz"
# Import the metadata into the file and write it.
comicbox --import comicinfo.xml --write cix "My Overtagged Comic.cbz"

Quirks

The comicbox.yaml format represents the ComicInfo.xml Web tag as sub an identifiers.<NID>.url tag. But fear not, you don't have to remember this. The CLI accepts heterogeneous tag types with the -m option, so you can type:

comicbox -p -m "Web: https://foo.com" mycomic.cbz

and the identifier tag should appear in comicbox.yaml as:

identifiers:
  foo.com:
    nss: ""
    url: https://foo.com
Identifiers

Comicbox aggregates IDS, GTINS and URLS from other formats into a common Identifiers structure.

Reprints

Comicbox aggregates Alternate Names, Aliases and IsVersionOf from other formats into a common Reprints list.

URNs

Because the Notes field is commonly abused in ComicInfo.xml to represent fields ComicInfo does not (yet?) support comicbox parses the notes field heavily looking for embedded data. Comicbox also writes identifiers into the Notes field using an Uniform Resource Name format.

Comicbox also looks for identifiers in Tag fields of formats that don't have their own Identifiers field.

Prettified Fields

Comicbox liberally accepts all kinds of values that may be enums in other formats, like AgeRating, Formats and Creidit Roles. In a weak attempt to standardize these values comicbox will Title case values submitted to these fields. When writing to standard formats, comicbox attempts to transforms these values into enums supported by the output format.

Packages

Comicbox actually installs three different packages:

  • comicbox The main API and CLI script.
  • comicfn2dict A separate library for parsing comic filenames into dicts it also includes a CLI script.
  • pdffile A utility library for reading and writing PDF files with an API like Python's ZipFile

⚙️ Config

comicbox accepts command line arguments but also an optional config file and environment variables.

The variables have defaults specified in a default yaml

The environment variables are the variable name prefixed with COMICBOX_. (e.g. COMICBOX_COMICINFOXML=0)

Log Level

change logging level:

LOGLEVEL=ERROR comicbox -p <path>

🛠 Development

You may access most development tasks from the makefile. Run make to see documentation.

🤔 Motivation

I didn't like Comictagger's API, so I built this for myself as an educational exercise and to use as a library for Codex comic reader.

📋 Schemas

Comicbox supports reading and writing several comic book metadata schemas.

Filename Schema

Comicbox includes a pretty good comic archive filename parser. It can extract a number of common fields from comic archive filenames.

The filename parser is available as a separate library: comicfn2dict

Location Name
Archive The archive filename
Import/Export comicbox-filename.txt

ComicInfo Schema v2.1 Draft (Comic Rack)

This schema used by the defunct Comic Rack reader is the de facto standard for comic book metadata on the internet. The Anansi Project now maintains the ComicInfo Schema and has compatibly and conservatively extended it.

ComicInfo StoryArcs

Comicbox also supports an unofficial, undocumented Mylar extension to ComicInfo.xml that encodes multiple Story Arcs and Story Arc Numbers as CSV values.

Location Name
Archive comicinfo.xml
Import/Export comicinfo.xml

MetronInfo Schema v1.0

The MetronInfo Schema is a new XML schema for comic book metadata, which hopes to improve some of the deficiencies that exist with the ComicInfo.xml schema.

Location Name
Archive metroninfo.xml
Import/Export metroninfo.xml

Metron MangaVolume

The MangaVolume tag is interpreted not as an arbitrary string, but as a range of integers delineated by a "-". e.g "1-3".

ComicBookInfo Schema v1.0 (Comic Book Lover)

The schema used by the defunct Comic Book Lover app. It supports a few useful tags that ComicInfo.xml does not, but it probably only survives because Comictagger supports writing it.

I have interpreted the ComicBookInfo example json into a ComicBookInfo JSON Schema.

Location Name
Archive Zip & Rar Comments
Import/Export comic-book-info.json

ComicBookInfo Role primary attribute

Comicbox discards the attribute.

PDF XMP Schema

The PDF metadata standard. Written directly to the pdf itself or exported as an xml file.

Adobe PDF Namespace Adobe PDF Standard § 14.3.3 Document Information Dictionary

PDF metadata is only read or written from and to PDF files.

Location Name
Archive PDF internal
Import/Export pdf-metadata.xml

Reading Embedded Metadata from keywords

Comicbox will read most any metadata standard it supports from the keywords field. If that fails it will consider the keywords field as a comma delimited "Tags" field.

Writing ComicInfo.xml to keywords

By default Comicbox will write ComicInfo XML to the keywords field (e.g. -w pdf)

Codex supports this because it uses Comicbox. Other comic readers do not support PDF embedded ComicInfo.xml, but since they already have ComicInfo.xml parsers it's possible that they might someday.

If Comicbox JSON is included in the write formats (e.g. -w pdf,json) Comicbox will write comicbox.json to the keywords field instead. It is unlikely that any other comic reader other than Codex will ever support this.

CoMet Schema v1.1 (Comic Viewer)

An old and extremely rare comic metadata standard from the defunct Comic Viewer comic book reader.

I have interpreted the CoMet Specification into a CoMet XSD.

Location Name
Archive comet.xml
Import/Export comet.xml

ComicTagger Schema

The most useful general comic book metadata writer is ComicTagger. It supports the ComicVine API, is extensible to other APIs, and features a nice desktop GUI. Internally, Comictagger keeps a metadata object to work with the schemas it supports. This schema allows the import and export of that schema.

Comictaggger genericmetadata.py

This schema is possibly only useful to developers using the API to import and export python dicts, but the capability to import an export this format json format as json exists. The author of ComicTagger offers no promises as to the stability of this API and I am very lazy, so the chances of this drifting out of date are anyone's guess. It was included because it was easy to do.

Location Name
Archive comictagger.json
Import/Export comictagger.json

Comicbox 2.0 Schema

The comicbox internal data structure which acts as a superset of the above schemas to allow interpolating.

Comicbox 2.0 JSON Schema

Comicbox JSON Format

Location Name
Archive comicbox.json
Import/Export comicbox.json

Comicbox YAML Format

YAML is a superset of JSON, so the JSON schema applies here.

Location Name
Archive comicbox.yaml
Import/Export comicbox.yaml

Comicbox CLI Format

The Comicbox CLI uses "flow style" YAML, which is an all on one line format to enter metadata on the command line.

Specifying metadata on the command line like this is additive.

Location Name
Comicbox CLI -m --metadata
Archive comicbox-cli.yaml
Import/Export comicbox-cli.yaml

Environment variables

There is a special environment variable DEBUG_TRANSFORM that will print verbose schema transform information

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comicbox-2.0.0a0.tar.gz (70.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

comicbox-2.0.0a0-py3-none-any.whl (142.7 kB view details)

Uploaded Python 3

File details

Details for the file comicbox-2.0.0a0.tar.gz.

File metadata

  • Download URL: comicbox-2.0.0a0.tar.gz
  • Upload date:
  • Size: 70.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for comicbox-2.0.0a0.tar.gz
Algorithm Hash digest
SHA256 44748d6de87a3e26b8cffa0b38b945d4e306eec8f458824b6c0544bc1c5c047e
MD5 a5933a14c8e666e5c6a9e0d1451a0ad7
BLAKE2b-256 54fa141107e9eee61d7c666f1765240b51432e77c63bd5ef353ecb221522a063

See more details on using hashes here.

File details

Details for the file comicbox-2.0.0a0-py3-none-any.whl.

File metadata

  • Download URL: comicbox-2.0.0a0-py3-none-any.whl
  • Upload date:
  • Size: 142.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for comicbox-2.0.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 95c6b55b4ea6f388d15df1d4a77c921e1383c1b4906267701881c4bc4ba0b9e8
MD5 2a99c4b336565aa0f9a8f264ff24c973
BLAKE2b-256 958c75806ab38c13f6158e29056ad2b94097362b490d5cdbb56a9e25a0d24819

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page