一个轻量级的 Python 实用函数工具库，涵盖文件读写、文本处理、JSON 解析、网络响应处理和 LLM 调用。

These details have not been verified by PyPI

Project links

Project description

larkfunc

一个轻量级的 Python 实用函数工具库，涵盖文件读写、文本处理、JSON 解析、网络响应处理和 LLM 调用等常见场景，帮助你快速构建数据处理流水线。

功能特性

文件读写 (larkfunc.io) - 文本文件读取、JSON 保存、目录管理
文本处理 (larkfunc.text) - LLM 响应清理、文件名提取、LinkedIn ID 解析
网络请求 (larkfunc.network) - HTTP 响应内容解码
JSON 处理 (larkfunc.json_utils) - 安全的 JSON 解析，不会因格式错误崩溃
LLM 调用 (larkfunc.llm) - 带自动重试的 LLM 调用封装

安装和环境配置

环境要求

Python >= 3.7

从 PyPI 安装（发布后）

pip install larkfunc

本地开发安装

git clone https://github.com/lark/larkfunc.git
cd larkfunc
pip install -e .

使用示例

快速开始

from larkfunc import read_txt_data, parse_json_safely, clean_llm_response

# 读取文本文件
content = read_txt_data("data/sample.txt")

# 安全解析 JSON
data = parse_json_safely('{"name": "lark", "version": 1}')

# 清理 LLM 返回的带代码块标记的文本
raw = '```json\n{"result": "ok"}\n```'
clean = clean_llm_response(raw)  # '{"result": "ok"}'

文件读写

from larkfunc.io import read_txt_to_list, save_json_to_file, ensure_directory_exists

# 按行读取文本文件
lines = read_txt_to_list("urls.txt")

# 确保输出目录存在
ensure_directory_exists("output/reports")

# 保存 JSON 文件
save_json_to_file({"key": "value"}, "output/result.json")

文本处理

from larkfunc.text import extract_linkedin_id, extract_filename_without_extension

# 提取 LinkedIn 用户 ID
uid = extract_linkedin_id("https://www.linkedin.com/in/johndoe/")
print(uid)  # 'johndoe'

# 提取不带扩展名的文件名
name = extract_filename_without_extension("/data/reports/summary.pdf")
print(name)  # 'summary'

LLM 调用

from larkfunc.llm import call_llm_with_retry

def my_chat(prompt: str) -> str:
    # 替换为你的 LLM API 调用实现
    import openai
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

result = call_llm_with_retry(
    content="这是一份简历...",
    prompt_template="\n请提取关键信息并以 JSON 格式返回：",
    chat_func=my_chat,
    max_retries=3
)

内置帮助文档查询

忘记函数用法时，无需查阅文档，直接在代码中调用 help_doc() 即可。

import larkfunc

# 查看所有可用函数概览（按子模块分组）
larkfunc.help_doc()

# 查看指定函数的完整文档（参数、返回值、示例、注意事项）
larkfunc.help_doc("read_txt_data")
larkfunc.help_doc("parse_json_safely")
larkfunc.help_doc("call_llm_with_retry")

# 也可使用 Python 原生 help()
help(larkfunc.clean_llm_response)

API 接口说明

`larkfunc.io` - 文件读写模块

函数	说明	参数	返回值
`read_txt_data(filepath)`	读取文本文件全部内容	`filepath: str`	`Optional[str]`
`read_txt_to_list(filepath)`	按行读取文本文件	`filepath: str`	`List[str]`
`save_json_to_file(data, output_path)`	保存字典为 JSON 文件	`data: dict, output_path: str`	`bool`
`ensure_directory_exists(path)`	确保目录存在	`path: str`	`None`

`larkfunc.text` - 文本处理模块

函数	说明	参数	返回值
`clean_llm_response(response)`	清理 LLM 响应中的代码块标记	`response: str`	`str`
`extract_filename_without_extension(filepath)`	提取不带扩展名的文件名	`filepath: str`	`str`
`extract_linkedin_id(url)`	从 LinkedIn URL 提取用户 ID	`url: str`	`str`

`larkfunc.network` - 网络请求模块

函数	说明	参数	返回值
`decode_response_content(response)`	解码 HTTP 响应为字符串	`response: Response`	`Optional[str]`

`larkfunc.json_utils` - JSON 处理模块

函数	说明	参数	返回值
`parse_json_safely(json_str)`	安全解析 JSON 字符串	`json_str: str`	`dict`

`larkfunc.llm` - LLM 调用模块

函数	说明	参数	返回值
`call_llm_with_retry(content, prompt_template, chat_func, max_retries)`	带重试的 LLM 调用	`content: str, prompt_template: str, chat_func: Callable, max_retries: int`	`str`

实战 Demo：LLM 批量处理 Excel

larkfunc.demos.llm_excel_processor 展示如何结合 larkfunc 各模块，构建 读取 Excel → 调用 LLM 提取信息 → 保存 JSON → 合并输出 Excel 的完整流水线。

应用场景

从 Excel 的教育经历文本中，通过大模型批量提取本科、硕士、博士毕业年份，输出结构化 Excel。

快速使用

1）安装依赖

pip install larkfunc pandas openpyxl

2）准备输入 Excel

输入文件必须包含 education 列，内容为教育经历文本，例如：

name	education	profile_url
Alice	B.S. MIT 2018, M.S. Stanford 2020	https://linkedin.com/in/alice
Bob	Ph.D. CMU 2022	https://linkedin.com/in/bob

3）程序化调用（推荐）

from larkfunc.demos.llm_excel_processor import run_pipeline

def my_chat(prompt: str) -> str:
    # 替换为实际 LLM API（OpenAI / 通义千问 / 自定义等）
    from openai import OpenAI
    client = OpenAI()
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return resp.choices[0].message.content

run_pipeline(
    input_excel="mlir_all.xlsx",
    output_excel="mlir_data.xlsx",
    output_json_dir="mlir_json",
    chat_func=my_chat,
    sleep_interval=0.5,
)

4）命令行调用

# 使用默认配置
python -m larkfunc.demos.llm_excel_processor

# 自定义输入输出路径
python -m larkfunc.demos.llm_excel_processor --input data.xlsx --output result.xlsx

# 调整 LLM 请求间隔
python -m larkfunc.demos.llm_excel_processor --sleep 1.0

5）导入配置类（可选）

from larkfunc.demos.llm_excel_processor import Config

# 查看默认配置
print(Config.MODEL)           # qwen2.5-coder-32b-instruct
print(Config.SLEEP_INTERVAL)  # 0.5
print(Config.OUTPUT_JSON_DIR) # mlir_json

预期输出

[INFO] 已加载 100 行数据，来源: mlir_all.xlsx
[INFO] 使用模型: qwen2.5-coder-32b-instruct
[INFO] 正在处理第 0 行: Alice
  [OK] 已保存至 mlir_json/row_0.json
[INFO] 正在处理第 1 行: Bob
  [OK] 已保存至 mlir_json/row_1.json
...
[SUMMARY] 处理完成: 成功=98, 跳过=0, 失败=2
[OK] 合并后的 Excel 已保存至: mlir_data.xlsx，共 98 条记录
[INFO] 全部流程完成。

输出 Excel 结构：

name	education	bs_graduate_year	ms_graduate_year	phd_graduate_year
Alice	B.S. MIT 2018...	2018	2020
Bob	Ph.D. CMU 2022			2022

工作原理

步骤	使用的 larkfunc 函数	说明
创建输出目录	`ensure_directory_exists()`	自动创建 JSON 中间文件目录
清理 LLM 响应	`clean_llm_response()`	去除 ```json 代码块标记
解析 JSON	`parse_json_safely()`	安全解析，失败返回空字典而非崩溃
保存中间结果	`save_json_to_file()`	每行保存一个 JSON，支持断点续传

断点续传

每行结果单独保存为 output_json_dir/row_{idx}.json，程序中断后重新运行会自动跳过已处理的行：

[SKIP] 第 0 行已处理，跳过: mlir_json/row_0.json
[SKIP] 第 1 行已处理，跳过: mlir_json/row_1.json
[INFO] 正在处理第 2 行: Charlie  ← 从中断处继续

常见问题

Q: 运行时报 NotImplementedError: 请将 my_chat 替换...

需要在调用 run_pipeline() 时传入已实现的 chat_func 参数（或在命令行入口的 main() 中替换 my_chat）。

Q: 报错 ValueError: 输入 Excel 文件缺少 'education' 列

检查输入 Excel 是否包含名为 education 的列（区分大小写）。

Q: 如何自定义输出 Excel 的列顺序？

通过 run_pipeline() 的 column_order 参数传入自定义列表：

run_pipeline(
    ...,
    column_order=["name", "bs_graduate_year", "ms_graduate_year"],
)

依赖项

依赖包	版本要求	用途
`requests`	任意版本	HTTP 请求支持（`network` 模块）
Python	>= 3.7	运行环境

贡献指南与许可证

贡献指南

Fork 本仓库
创建特性分支：git checkout -b feature/your-feature
提交更改：git commit -m "Add your feature"
推送分支：git push origin feature/your-feature
创建 Pull Request

许可证

本项目基于 MIT License 开源。

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Apr 22, 2026

0.1.1

Apr 15, 2026

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larkfunc-0.1.2.tar.gz (19.4 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

larkfunc-0.1.2-py3-none-any.whl (20.6 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file larkfunc-0.1.2.tar.gz.

File metadata

Download URL: larkfunc-0.1.2.tar.gz
Upload date: Apr 22, 2026
Size: 19.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for larkfunc-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`753e7ead1104d33bc5e1c225af42ea739e1b0c1db6591090725edb870f612722`
MD5	`4a07ffb76e11e71d594298866fcd1db5`
BLAKE2b-256	`9d2658b0018073b0c9d0f5b867f651b655f2be54f0d54d019a3412e15bdc263e`

See more details on using hashes here.

File details

Details for the file larkfunc-0.1.2-py3-none-any.whl.

File metadata

Download URL: larkfunc-0.1.2-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 20.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for larkfunc-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10f8b1f98300f2876db88d269d3dcb34f430960fadf6d8e0602b52e7b5476cfa`
MD5	`701eb1b28157eb293e952696d0a73af2`
BLAKE2b-256	`2cdfa0ee9cef3841cdbcb91c0e1ca11da6aa1599480e240bb1fc39726592a0a1`

See more details on using hashes here.

larkfunc 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

larkfunc

功能特性

安装和环境配置

环境要求

从 PyPI 安装（发布后）

本地开发安装

使用示例

快速开始

文件读写

文本处理

LLM 调用

内置帮助文档查询

API 接口说明

larkfunc.io - 文件读写模块

larkfunc.text - 文本处理模块

larkfunc.network - 网络请求模块

larkfunc.json_utils - JSON 处理模块

larkfunc.llm - LLM 调用模块

实战 Demo：LLM 批量处理 Excel

应用场景

快速使用

预期输出

工作原理

断点续传

常见问题

依赖项

贡献指南与许可证

贡献指南

许可证

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`larkfunc.io` - 文件读写模块

`larkfunc.text` - 文本处理模块

`larkfunc.network` - 网络请求模块

`larkfunc.json_utils` - JSON 处理模块

`larkfunc.llm` - LLM 调用模块