Skip to main content

A text segmentation tool using Jina AI API

Project description

Jina Segmenter

一个基于 Jina AI API 的智能文本分段工具。它能够智能地将长文本分割成合适大小的片段,同时保持语义的完整性。

特性

  • 智能文本分段,保持语义完整性
  • 自动计算和优化分片大小
  • 支持自定义最大分片大小
  • 返回每个分片的 token 数量
  • 简单易用的 API

安装

pip install jina-segmenter

使用方法

首先,你需要设置 Jina AI 的 API key:

import os
os.environ['JINA_API_KEY'] = 'your_jina_api_key'

然后你就可以使用分段功能:

from jina_segmenter import segment_text

text = "你的长文本..."
chunks = segment_text(text)  # 默认最大分片大小为 1500 tokens

# 查看分片结果
for i, chunk in enumerate(chunks, 1):
    print(f"片段 {i} (tokens: {chunk['tokens']}):")
    print(chunk['text'])
    print("-" * 30)

你也可以自定义最大分片大小:

chunks = segment_text(text, max_chunk_size=1000)

获取 API Key

  1. 访问 Jina AI
  2. 注册并登录你的账号
  3. 在控制台中创建新的 API key

依赖

  • Python >= 3.6
  • requests >= 2.31.0

许可证

MIT License

作者

WSY (wangshuyue@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jina_segmenter-0.1.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

jina_segmenter-0.1.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file jina_segmenter-0.1.0.tar.gz.

File metadata

  • Download URL: jina_segmenter-0.1.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for jina_segmenter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e6ed6fd18ad9c8b4d939cc9fcac899df8261ebc92c0b4f2e6bed2f73fad2e97c
MD5 7d3501ddcd77365814b87bd7ed1bed3a
BLAKE2b-256 d9837ea286948794bb049a73cbc4da798bed685236537c9ba782ae0052139333

See more details on using hashes here.

File details

Details for the file jina_segmenter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jina_segmenter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84603e4b9f3fd4c8b24f6a626605474f1d9d1ff022caad482a4fb502a0b733f8
MD5 4138e965422845e0472f1c09234ec73b
BLAKE2b-256 6c68b821c602296cec211b9f296089c80e86060e36d42e2f42cb0700b982354b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page