A text splitting tool
Project description
Tixent: A Text Splitting Tool
Installation
pip install tixent
Example
Suppose we have a function template that generates a string from a list of texts. Additionally, suppose we have a large list of texts. When you apply that list of texts to the function, it generates a long string.
Tixent can split the string generated by the template function so that the return value of counter for each element is less than a certain number.
Here, counter is a function that maps a string to an integer.
Examples of such functions are len
, which measures the length of a string, or tiktoken_counter("text-davinci-003")
, which measures the number of tokens in a string
from typing import List
from tixent import split, tiktoken_counter
def summarization_template(texts: List[str]) -> str:
text = " ".join(texts)
t = "Summarize the following text.\n"
t += f'Text: """{text}"""'
return t
texts = [
"Lorem ipsum dolor sit amet",
"consectetur adipiscing elit",
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua",
"Ut enim ad minim veniam",
"quis nostrud exercitation ullamco laboris nisi",
"ut aliquip ex ea commodo consequat",
"Duis aute irure dolor in reprehenderit in voluptate velit",
"esse cillum dolore eu fugiat nulla pariatur",
"Excepteur sint occaecat cupidatat non proident",
"sunt in culpa qui officia deserunt mollit anim id est laborum",
]
counter = tiktoken_counter("text-davinci-003")
max_count = 60
split_texts = split(texts, summarization_template, counter, max_count)
for text in split_texts:
count = counter(text)
assert count <= max_count
print(f"count: {count}")
print(text)
print()
count: 60
Summarize the following text.
Text: """Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim veniam"""
count: 58
Summarize the following text.
Text: """quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat Duis aute irure dolor in reprehenderit in voluptate velit"""
count: 43
Summarize the following text.
Text: """esse cillum dolore eu fugiat nulla pariatur Excepteur sint occaecat cupidatat non proident"""
count: 31
Summarize the following text.
Text: """sunt in culpa qui officia deserunt mollit anim id est laborum"""
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tixent-0.0.3.tar.gz
.
File metadata
- Download URL: tixent-0.0.3.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.25.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72084f3d107d953435b72c004c93af334a121330d115d92a8a1d2fa2e41c9488 |
|
MD5 | a578e2ebb2febec9a2b094a40b6c8b86 |
|
BLAKE2b-256 | 1d34a38d076c40a35dae1edd784c4eb2543b186212260aec42e8aa9ee85f32f3 |
File details
Details for the file tixent-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: tixent-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.25.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3edee261041b62f17a8da48d7852baf71b6e1e0fa34194c10abc7a6432973b48 |
|
MD5 | bd5e10b40f9cad21e96b55eeab7353b0 |
|
BLAKE2b-256 | a5e05ac3b710c107bfc52faee283a413d6563f8d0ac376de778b7c8cdf93da4d |