A text segmentation tool using Jina AI API
Project description
Jina Segmenter
一个基于 Jina AI API 的智能文本分段工具。它能够智能地将长文本分割成合适大小的片段,同时保持语义的完整性。
特性
- 智能文本分段,保持语义完整性
- 自动计算和优化分片大小
- 支持自定义最大分片大小
- 返回每个分片的 token 数量
- 简单易用的 API
安装
pip install jina-segmenter
使用方法
首先,你需要设置 Jina AI 的 API key:
import os
os.environ['JINA_API_KEY'] = 'your_jina_api_key'
然后你就可以使用分段功能:
from jina_segmenter import segment_text
text = "你的长文本..."
chunks = segment_text(text) # 默认最大分片大小为 1500 tokens
# 查看分片结果
for i, chunk in enumerate(chunks, 1):
print(f"片段 {i} (tokens: {chunk['tokens']}):")
print(chunk['text'])
print("-" * 30)
你也可以自定义最大分片大小:
chunks = segment_text(text, max_chunk_size=1000)
获取 API Key
- 访问 Jina AI
- 注册并登录你的账号
- 在控制台中创建新的 API key
依赖
- Python >= 3.6
- requests >= 2.31.0
许可证
MIT License
作者
WSY (wangshuyue@gmail.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jina_segmenter-0.1.0.tar.gz
(3.9 kB
view details)
Built Distribution
File details
Details for the file jina_segmenter-0.1.0.tar.gz
.
File metadata
- Download URL: jina_segmenter-0.1.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6ed6fd18ad9c8b4d939cc9fcac899df8261ebc92c0b4f2e6bed2f73fad2e97c |
|
MD5 | 7d3501ddcd77365814b87bd7ed1bed3a |
|
BLAKE2b-256 | d9837ea286948794bb049a73cbc4da798bed685236537c9ba782ae0052139333 |
File details
Details for the file jina_segmenter-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: jina_segmenter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84603e4b9f3fd4c8b24f6a626605474f1d9d1ff022caad482a4fb502a0b733f8 |
|
MD5 | 4138e965422845e0472f1c09234ec73b |
|
BLAKE2b-256 | 6c68b821c602296cec211b9f296089c80e86060e36d42e2f42cb0700b982354b |