Skip to main content

A library to split data into tokens

Project description

python-tally-token

Python PyPI version shields.io License codecov

What is this?

tally-token is a Python library for split data into tokens with same length.

Tally is a historical object for prove something by splitting wood into tokens and matching tokens.

Medieval English split tally stick (front and reverse view). The stick is notched and inscribed to record a debt owed to the rural dean of Preston Candover, Hampshire, of a tithe of 20d each on 32 sheep, amounting to a total sum of £2 13s. 4d.

Usage

Install

$ pip install tally-token

Example

split

You can use split_text to split text into tokens. split_text returns list of random bytes.

>>> from tally_token import split_text
>>> split_text("Hello, World!")
[b'qQ\xa5\x97\x84\x88\xd7U%\xfb(k\xa1', b'94\xc9\xfb\xeb\xa4\xf7\x02J\x89D\x0f\x80']

merge

You can use merge_text to merge tokens into text. merge_text returns cleartext.

>>> from tally_token import merge_text
>>> merge_text([b'qQ\xa5\x97\x84\x88\xd7U%\xfb(k\xa1', b'94\xc9\xfb\xeb\xa4\xf7\x02J\x89D\x0f\x80'])
'Hello, World!'

split with custom length

>>> from tally_token import split_text, merge_text
>>> split_text("Hello, World!", 5)
[b'N&\xce\\\xbc6dxp\x87\xa8#z', b'\xa3D\\A\xf8\xd1KDX\x1cKx\x87', b'\xffZ\x03\xf5\x92Q\xf52\xc4\x1e\xf2\xf8\x06', b'\xaa\xdd:\x85F\xa1\xcdbp\xf3\xe6P\xe5', b'\xf0\x80\xc7\x01\xff;7;\xf3\x04\x9b\x97?']
>>> merge_text([b'N&\xce\\\xbc6dxp\x87\xa8#z', b'\xa3D\\A\xf8\xd1KDX\x1cKx\x87', b'\xffZ\x03\xf5\x92Q\xf52\xc4\x1e\xf2\xf8\x06', b'\xaa\xdd:\x85F\xa1\xcdbp\xf3\xe6P\xe5', b'\xf0\x80\xc7\x01\xff;7;\xf3\x04\x9b\x97?'])
'Hello, World!'

split with custom encoding

>>> from tally_token import split_text, merge_text
>>> split_text("こんにちは", encoding="CP932")
[b'g\xc3\x12\xeal?\xe5[\x03\xad', b'\xe5r\x90\x1b\xee\xf6g\xe4\x81`']
>>> merge_text([b'g\xc3\x12\xeal?\xe5[\x03\xad', b'\xe5r\x90\x1b\xee\xf6g\xe4\x81`'], encoding="CP932")
'こんにちは'

bytes interface

You can use split_bytes_into and merge_bytes_into to split and merge bytes. This is useful for split binary data.

>>> from tally_token import split_bytes_into, merge_bytes_into
>>> split_bytes_into(b"Hello, World!", 5)
[b'\xc5b\xf4E)\xe1vO8\xff@\xf9\xdd', b'\x84\xb9X#\x85\xf5\xed\xbcM\xc4\xef\xf4\xd3', b'\xb47\xf6\xfa?\x14\xa8`\xc9\xe0\xe5\x87\x14', b'\x1cd\xb4o\xe8I:\xe5\xf6\x13\xe5\x93G', b'\xa1\xed\x82\x9f\x14e)!%\xba\xc3}|']
>>> merge_bytes_into([b'\xc5b\xf4E)\xe1vO8\xff@\xf9\xdd', b'\x84\xb9X#\x85\xf5\xed\xbcM\xc4\xef\xf4\xd3', b'\xb47\xf6\xfa?\x14\xa8`\xc9\xe0\xe5\x87\x14', b'\x1cd\xb4o\xe8I:\xe5\xf6\x13\xe5\x93G', b'\xa1\xed\x82\x9f\x14e)!%\xba\xc3}|'])
b'Hello, World!'

Reference

LICENSE

BSD 3-Clause License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tally-token-0.2.0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

tally_token-0.2.0-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file tally-token-0.2.0.tar.gz.

File metadata

  • Download URL: tally-token-0.2.0.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for tally-token-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fbdaba632f4fa12761f331cf6b6eb8931f3e2425ac9264faff19b6a0db177d18
MD5 48791fc0bfd535784a9036db7b36ccce
BLAKE2b-256 2af335278e2202b8d7ce6ed682418e31e90f40c47d439b3ec468d82df47baa26

See more details on using hashes here.

File details

Details for the file tally_token-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tally_token-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for tally_token-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac098d52464f18424a67e148aba180471e645be7f2361f48c12b50f1dc1272e4
MD5 d362d7e60b751ec61db5b14ca4c57b56
BLAKE2b-256 706fd2384510508afb1ee4dc3bbddb83b996443c8ecab80c562d7b74546b3e57

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page