GLM (智谱AI) 大模型 Python 客户端库 - 支持文本生成、OCR 文字识别、视觉理解、图像生成、语音合成、视频生成、文本嵌入等

These details have not been verified by PyPI

Project links

Project description

GLM Client

智谱AI GLM 大模型 Python 客户端库 —— 支持文本生成、OCR 文字识别、视觉理解、图像生成、语音合成、视频生成、文本嵌入、语音识别等多模态能力。

特性

OCR 文字识别：图片/PDF 文字提取，支持本地文件和 URL，免费模型驱动
文本生成：流式/非流式聊天补全，支持思考模式和联网搜索
视觉理解：图像分析、多图比较、批量分析
图像生成：基于 CogView 的文生图
语音合成：多音色 TTS，支持 Word 文档转语音
视频生成：文本/图片生成视频
文本嵌入：向量生成和相似度计算
语音识别：音频转文字
自动重试：指数退避重试机制
双 SDK 支持：优先使用 zai-sdk，兼容旧版 zhipuai

快速开始

安装

# 基础安装（图片 OCR）
pip install glm-client

# 包含 PDF 支持的安装
pip install "glm-client[pdf]"

配置

获取 API Key：https://open.bigmodel.cn/

# 方法一：环境变量
export ZAI_API_KEY="your_api_key_here"

# 方法二：.env 文件
echo 'ZAI_API_KEY=your_api_key_here' > .env

30 秒上手

from glm_client import GLMClient, chat_completion

client = GLMClient()

# 流式对话
response = chat_completion("你好，介绍一下你自己", client=client, stream=True)
for chunk in response:
    print(chunk, end="")

或使用命令行：

glm-client chat "你好，介绍一下你自己"

免费模型推荐

基于基准测试的免费模型推荐：

能力	推荐模型	价格	上下文
文本对话 / 摘要	`glm-4-flash`	永久免费	128K
OCR 识别	`glm-4v-flash`	永久免费	-
视觉理解	`glm-4v-flash`	永久免费	-
图像生成	`cogview-3-flash`	付费	-
语音合成	`glm-tts`	付费	-
视频生成	`cogvideox-2`	付费	-
文本嵌入	`embedding-3`	付费	8K
语音识别	`glm-asr`	付费	-

新用户注册即获 2000 万 token 免费额度，推荐使用免费模型进行开发测试。

功能详解

OCR 文字识别

基于 GLM-4V-Flash 视觉模型的 OCR 能力，支持图片和 PDF 文件的文字提取，完全免费。

基础用法

from glm_client import GLMClient, ocr

client = GLMClient()

# 识别本地图片
text = ocr("invoice.jpg", client=client)
print(text)

# 识别网络图片
text = ocr("https://example.com/screenshot.png", client=client)

# 识别 PDF 文件（需要 pip install "glm-client[pdf]"）
text = ocr("contract.pdf", client=client)

# 识别网络 PDF
text = ocr("https://example.com/document.pdf", client=client)

批量识别

from glm_client import GLMClient, ocr_batch

client = GLMClient()

results = ocr_batch(
    ["page1.jpg", "page2.png", "page3.jpg"],
    client=client,
    verbose=True,  # 显示处理进度
)

for path, text in zip(["page1.jpg", "page2.png", "page3.jpg"], results):
    print(f"=== {path} ===")
    print(text)

自定义参数

from glm_client import GLMClient, ocr

client = GLMClient()

# 自定义 OCR 提示词（适用于特定场景）
text = ocr(
    "receipt.jpg",
    client=client,
    prompt="请提取这张发票中的金额、日期、商品名称，以 JSON 格式输出",
)

# 调整 PDF 渲染精度（默认 200，提高可获得更好识别效果）
text = ocr(
    "document.pdf",
    client=client,
    dpi=300,
    max_tokens=8192,
)

# 使用其他视觉模型
text = ocr("photo.jpg", client=client, model="glm-4.6v-flash")

API 参考

ocr(source, client, ...) — 识别单个文件

参数	类型	默认值	说明
`source`	`str \| Path`	必填	图片/PDF 文件路径或 URL
`client`	`GLMClient`	必填	GLM 客户端实例
`model`	`str`	`glm-4v-flash`	视觉模型名称
`prompt`	`str`	内置 OCR prompt	自定义识别提示词
`max_tokens`	`int`	`4096`	最大输出 token 数
`temperature`	`float`	`0.1`	采样温度
`dpi`	`int`	`200`	PDF 渲染分辨率
`page_separator`	`str`	含页码的分隔符	PDF 多页分隔符模板
`verbose`	`bool`	`False`	是否显示处理进度

ocr_batch(sources, client, ...) — 批量识别多个文件

参数与 ocr() 类似，sources 为文件路径/URL 列表，返回对应的结果列表。

支持的文件类型

类型	格式	说明
图片	jpg, jpeg, png, gif, webp, bmp	本地文件或 URL
PDF	pdf	需要安装 `pymupdf`（`pip install "glm-client[pdf]"`）

文档摘要

基于最强免费文本模型 glm-4-flash，对文档内容生成结构化摘要，输出统一 JSON 格式，适用于文件管理、搜索和分类。

基础用法

from glm_client import GLMClient, summarize

client = GLMClient()

# 对文本内容生成摘要
result = summarize("合同全文内容...", client=client)

print(result.title)          # 文档标题
print(result.document_type)  # "合同" / "方案" / "报告" ...
print(result.summary)        # 核心摘要（200 字以内）
print(result.keywords)       # ["关键词1", "关键词2", ...]
print(result.key_info)       # {"dates": [...], "parties": [...], "amounts": [...]}

# 导出为 dict 或 JSON
result.to_dict()
result.to_json()

直接传入文件

from glm_client import GLMClient, summarize_file

client = GLMClient()

# 自动 OCR + 摘要（支持图片和 PDF）
result = summarize_file("contract.pdf", client=client, verbose=True)
print(result.to_json())

输出结构

summarize() 和 summarize_file() 返回 DocumentSummary 对象，包含以下字段：

字段	类型	说明	示例
`title`	`str`	文档标题	`"XX公司技术服务合同"`
`document_type`	`str`	文件类型	`"合同"`
`category`	`str`	业务分类	`"法务"`
`summary`	`str`	核心摘要（≤200 字）	`"甲方委托乙方提供..."`
`keywords`	`list[str]`	关键词（最多 5 个）	`["技术服务", "合同"]`
`key_info`	`dict`	关键信息	`{"dates": [...], "parties": [...]}`
`language`	`str`	主要语言	`"zh"`
`confidence`	`str`	置信度	`"high"`

document_type 可选值：合同、协议、方案、报告、通知、简历、发票、收据、证书、规章制度、会议纪要、邮件、论文、说明书、其他。

与文件管理系统集成示例

from glm_client import GLMClient, summarize_file

client = GLMClient()

def index_document(file_path: str) -> dict:
    """扫描文件并生成索引，存入数据库。"""
    result = summarize_file(file_path, client=client)
    return {
        "file_path": file_path,
        "title": result.title,
        "doc_type": result.document_type,
        "summary": result.summary,
        "keywords": ", ".join(result.keywords),
        "parties": ", ".join(result.key_info.get("parties", [])),
        "dates": ", ".join(result.key_info.get("dates", [])),
        "metadata_json": result.to_json(),
    }

# 批量索引
for doc in ["contract.pdf", "proposal.docx_scan.jpg", "report.pdf"]:
    entry = index_document(doc)
    # db.insert(entry)  # 存入你的数据库
    print(entry["title"], entry["keywords"])

文本生成

# 基础对话
glm-client chat "解释量子计算"

# 交互模式
glm-client chat -i

# 启用思考模式（需支持思考的模型）
glm-client chat "解一道复杂的数学题" --thinking

# 启用联网搜索
glm-client chat "今天的新闻" --web-search

# 自定义参数
glm-client chat "写一首诗" --temperature 0.9 --no-stream

from glm_client import GLMClient, chat_completion

client = GLMClient()

# 非流式
response = chat_completion("你好", client=client, stream=False)
print(response)

# 流式
for chunk in chat_completion("你好", client=client, stream=True):
    print(chunk, end="")

# 多轮对话
history = [{"role": "user", "content": "我叫小明"}, {"role": "assistant", "content": "你好小明！"}]
response = chat_completion("我叫什么名字？", client=client, history=history, stream=False)

视觉理解

glm-client vision "描述这张图片" --image photo.jpg
glm-client vision "图中有什么？" --image-url "https://example.com/image.jpg"

from glm_client import vision_completion, compare_images

result = vision_completion("描述图片内容", "photo.jpg", client=client)
print(result)

# 多图比较
result = compare_images("比较这两张图片的区别", ["img1.jpg", "img2.jpg"], client=client)

图像生成

glm-client image "一只坐在桌上的猫"
glm-client image "日落" --output sunset.png --size 1024x1024

from glm_client import generate_image, generate_and_save

urls = generate_image("一只猫", client=client)
paths = generate_and_save("一只猫", client=client, output="output/")

语音合成

glm-client tts "你好，世界！"
glm-client tts --word document.docx --output audio.wav
glm-client tts "测试" --voice chen --format mp3

from glm_client import text_to_speech, word_to_speech

path = text_to_speech("你好", client=client, voice="female")
path = word_to_speech("doc.docx", client=client)

可用音色：female（彤彤，默认）、chen（小陈）、cuicui（锤锤）、jam、kazi、douji、luodo

视频生成

glm-client video "一只猫在玩球"
glm-client video "让画面动起来" --image-url "https://example.com/image.jpg"
glm-client video "日落海滩" --output sunset.mp4 --fps 60

from glm_client import generate_video, generate_and_save_video

url = generate_video("一只猫在玩球", client=client)
path = generate_and_save_video("一只猫在玩球", client=client, output="video.mp4")

文本嵌入

glm-client embed "人工智能"
glm-client embed "机器学习" --compare "深度学习"

from glm_client import create_embedding, compute_similarity

vector = create_embedding("人工智能", client=client)
score = compute_similarity("机器学习", "深度学习", client=client)
print(f"相似度: {score:.4f}")

语音识别

glm-client asr --file audio.wav
glm-client asr -f recording.mp3 --language zh

from glm_client import transcribe_audio

text = transcribe_audio("audio.wav", client=client)
print(text)

CLI 命令参考

命令	说明	示例
`chat`	文本生成	`glm-client chat "你好"`
`vision`	图像分析	`glm-client vision "描述" -i img.jpg`
`image`	图像生成	`glm-client image "一只猫"`
`tts`	语音合成	`glm-client tts "你好"`
`video`	视频生成	`glm-client video "日落"`
`embed`	文本嵌入	`glm-client embed "文本"`
`asr`	语音识别	`glm-client asr -f audio.wav`
`config`	配置管理	`glm-client config --list-models`

OCR 提示：OCR 功能目前仅通过 Python API 提供（ocr() / ocr_batch()），CLI 命令支持即将推出。

作为库使用

from glm_client import (
    GLMClient,
    chat_completion,
    ocr,
    ocr_batch,
    summarize,
    summarize_file,
    vision_completion,
    generate_image,
    text_to_speech,
    generate_video,
    create_embedding,
    transcribe_audio,
)

client = GLMClient()

# 文本
response = chat_completion("你好", client=client, stream=False)

# OCR 文字识别
text = ocr("invoice.jpg", client=client)
text = ocr("contract.pdf", client=client)

# 文档摘要（结构化 JSON 输出）
result = summarize("文档内容...", client=client)
print(result.title, result.keywords)

# 文件直接摘要（OCR + 摘要一步到位）
result = summarize_file("contract.pdf", client=client)

# 视觉
result = vision_completion("描述图片", "photo.jpg", client=client)

# 图像生成
urls = generate_image("日落", client=client)

# 嵌入
vector = create_embedding("测试文本", client=client)

开发

# 安装开发依赖
uv sync

# 运行测试
uv run pytest tests/

# 运行集成测试（需要真实 API Key）
uv run pytest -m integration

# 运行基准测试
uv run python scripts/benchmark_summarize_ocr.py

# 构建包
uv build

常见问题

Q: 认证失败？ 确保设置了 ZAI_API_KEY 或 ZHIPUAI_API_KEY 环境变量。

Q: 速率限制？ 客户端内置自动重试机制（指数退避），请等待后重试。

Q: 模型不存在？ 使用 glm-client config --list-models 查看所有可用模型。

许可证

MIT License - 详见 LICENSE

参考

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glm_client-0.3.0.tar.gz (131.8 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

glm_client-0.3.0-py3-none-any.whl (45.1 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file glm_client-0.3.0.tar.gz.

File metadata

Download URL: glm_client-0.3.0.tar.gz
Upload date: Apr 23, 2026
Size: 131.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for glm_client-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`1b3819f768e0a66dc7e6daec218ef895e12e62be71c290a5c31ae8deb7df105e`
MD5	`3d1627460e986f477fb410330a642735`
BLAKE2b-256	`1e205122f2b1fe1ec44881425afbba642302edbfe24f1bcce25686dc2fbd2a78`

See more details on using hashes here.

File details

Details for the file glm_client-0.3.0-py3-none-any.whl.

File metadata

Download URL: glm_client-0.3.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 45.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for glm_client-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`568e0e1b4090c63d88e86ed5c37cbae3951ee85ec3dc786742d602661bc9ad46`
MD5	`1b96d93d41f46a1d002f36cf9868c4ea`
BLAKE2b-256	`303855250890322c826e6c325e08d732c33af75e77b754fba341fcf6cbe95a74`

See more details on using hashes here.

glm-client 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GLM Client

特性

快速开始

安装

配置

30 秒上手

免费模型推荐

功能详解

OCR 文字识别

基础用法

批量识别

自定义参数

API 参考

支持的文件类型

文档摘要

基础用法

直接传入文件

输出结构

与文件管理系统集成示例

文本生成

视觉理解

图像生成

语音合成

视频生成

文本嵌入

语音识别

CLI 命令参考

作为库使用

开发

常见问题

许可证

参考

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes