Skip to main content

从 PDF/图片/压缩包/EPUB 中提取 CIP 图书信息的 Python 工具

Project description

cipx

从 PDF、图片、压缩包和 EPUB 中提取 CIP 图书信息的 Python 工具。

Python License Documentation

安装

项目要求 Python 3.11 或更高版本。

pip install cipx

快速开始

from cipx import CIPX, extract

cipx = CIPX()

# 根据文件后缀自动分发到最合适的提取入口
result = cipx.extract("book.pdf")
print(result.success)
print(result.bookinfo.title)
print(result.bookinfo.isbn)

# 也可以直接调用顶层函数
result = extract("book.epub")
print(result.bookinfo.isbn)

如果你已经明确知道文件类型,也可以直接调用对应方法:

from cipx import CIPX

cipx = CIPX()

image_result = cipx.from_image("cover.jpg")
pdf_result = cipx.from_pdf("book.pdf")
epub_result = cipx.from_epub("book.epub")
archive_result = cipx.from_archive("book.uvz")

返回结果

提取结果是一个 ExtractResult 对象,常用字段包括:

  • success:是否提取成功
  • bookinfo:书目信息,包含 titleauthorpublisherpubdateisbncip
  • meta:源文件信息
  • locate:检测定位信息(图片、PDF、压缩包场景)
  • ocr:OCR 识别结果
  • elapsed:处理耗时
  • error:失败时的错误信息

bookinfo 还提供 isbn_validisbn13isbn10 等便捷属性。

配置

全局配置

from cipx.config import configure

configure(log_level="DEBUG", strict=3)

自定义 Settings

from cipx import CIPX
from cipx.config import Settings

settings = Settings(
	log_level="DEBUG",
	strict=4,
	detector={"conf_threshold": 0.5},
	ocr={"ocr_model": "medium"},
)

cipx = CIPX(config=settings)

支持格式

  • 图片:.jpg.jpeg.png
  • PDF:.pdf
  • EPUB:.epub
  • 压缩包:.zip.rar.uvz

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cipx-0.0.3.tar.gz (10.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cipx-0.0.3-py3-none-any.whl (10.0 MB view details)

Uploaded Python 3

File details

Details for the file cipx-0.0.3.tar.gz.

File metadata

  • Download URL: cipx-0.0.3.tar.gz
  • Upload date:
  • Size: 10.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cipx-0.0.3.tar.gz
Algorithm Hash digest
SHA256 28351da96758658d11ade681a215afe5535e1da15dec60c166ce662e92172dd0
MD5 4504bd49945df48723eb2ad5f122eda8
BLAKE2b-256 a4a7ecfa0ce3e02bff6729faffe4bd8cf2e5d9fbaef815dd676553318eed09d2

See more details on using hashes here.

File details

Details for the file cipx-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: cipx-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cipx-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0a1458a1aac77811f503dfb2bffba68b74bfb4c7ae68ae612f6fc03ec5d973da
MD5 77150405a902929275768a0b4f7fcca4
BLAKE2b-256 a6c26fc9a25037cc9e098a404fd7792257049e0e9e7f8f965fc0e705dfa7ae5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page