从 PDF/图片/压缩包/EPUB 中提取 CIP 图书信息的 Python 工具
Project description
cipx
从 PDF、图片、压缩包和 EPUB 中提取 CIP 图书信息的 Python 工具。
安装
项目要求 Python 3.11 或更高版本。
pip install cipx
快速开始
from cipx import CIPX, extract
cipx = CIPX()
# 根据文件后缀自动分发到最合适的提取入口
result = cipx.extract("book.pdf")
print(result.success)
print(result.bookinfo.title)
print(result.bookinfo.isbn)
# 也可以直接调用顶层函数
result = extract("book.epub")
print(result.bookinfo.isbn)
如果你已经明确知道文件类型,也可以直接调用对应方法:
from cipx import CIPX
cipx = CIPX()
image_result = cipx.from_image("cover.jpg")
pdf_result = cipx.from_pdf("book.pdf")
epub_result = cipx.from_epub("book.epub")
archive_result = cipx.from_archive("book.uvz")
返回结果
提取结果是一个 ExtractResult 对象,常用字段包括:
success:是否提取成功bookinfo:书目信息,包含title、author、publisher、pubdate、isbn、cipmeta:源文件信息locate:检测定位信息(图片、PDF、压缩包场景)ocr:OCR 识别结果elapsed:处理耗时error:失败时的错误信息
bookinfo 还提供 isbn_valid、isbn13、isbn10 等便捷属性。
配置
全局配置
from cipx.config import configure
configure(log_level="DEBUG", strict=3)
自定义 Settings
from cipx import CIPX
from cipx.config import Settings
settings = Settings(
log_level="DEBUG",
strict=4,
detector={"conf_threshold": 0.5},
ocr={"ocr_model": "medium"},
)
cipx = CIPX(config=settings)
支持格式
- 图片:
.jpg、.jpeg、.png等 - PDF:
.pdf - EPUB:
.epub - 压缩包:
.zip、.rar、.uvz
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cipx-0.0.3.tar.gz
(10.0 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
cipx-0.0.3-py3-none-any.whl
(10.0 MB
view details)
File details
Details for the file cipx-0.0.3.tar.gz.
File metadata
- Download URL: cipx-0.0.3.tar.gz
- Upload date:
- Size: 10.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28351da96758658d11ade681a215afe5535e1da15dec60c166ce662e92172dd0
|
|
| MD5 |
4504bd49945df48723eb2ad5f122eda8
|
|
| BLAKE2b-256 |
a4a7ecfa0ce3e02bff6729faffe4bd8cf2e5d9fbaef815dd676553318eed09d2
|
File details
Details for the file cipx-0.0.3-py3-none-any.whl.
File metadata
- Download URL: cipx-0.0.3-py3-none-any.whl
- Upload date:
- Size: 10.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a1458a1aac77811f503dfb2bffba68b74bfb4c7ae68ae612f6fc03ec5d973da
|
|
| MD5 |
77150405a902929275768a0b4f7fcca4
|
|
| BLAKE2b-256 |
a6c26fc9a25037cc9e098a404fd7792257049e0e9e7f8f965fc0e705dfa7ae5e
|