Skip to main content

A library to split data into tokens

Project description

python-tally-token

Python PyPI version shields.io License codecov

What is this?

tally-token is a Python library for split data into tokens with same length.

Tally is a historical object for prove something by splitting wood into tokens and matching tokens.

Medieval English split tally stick (front and reverse view). The stick is notched and inscribed to record a debt owed to the rural dean of Preston Candover, Hampshire, of a tithe of 20d each on 32 sheep, amounting to a total sum of £2 13s. 4d.

Usage

Install

$ pip install tally-token

CLI Usage

$ tally-token --help
usage: tally-token [-h] {split,merge} ...

positional arguments:
  {split,merge}  Commands: split: split a file into multiple files merge: merge multiple files into a fileExample: tally-token split example.bin
                 example.bin.1 example.bin.2 example.bin.3 tally-token merge example-merged.bin example.bin.1 example.bin.2 example.bin.3

options:
  -h, --help     show this help message and exit

split

You can use split to split a file into multiple files.

$ tally-token split something.bin split-1.bin split-2.bin split-3.bin

merge

You can use merge to merge multiple files into a file.

$ tally-token merge merged.bin split-1.bin split-2.bin split-3.bin

Large files

Nothing special. You can split and merge large file.

$ dd if=/dev/urandom of=original.1g.bin bs=1G count=1
$ tally-token split original.1g.bin split-1.bin split-2.bin split-3.bin
$ shasum -a 256 original.1g.bin
> 736a344d99d27e2dcdab8bc37ca94c83eda26f812a3dee87ac98989f89b3f965 original.1g.bin
$ tally-token merge recovery.1g.bin split-1.bin split-2.bin split-3.bin
$ shasum -a 256 recovery.1g.bin
> 736a344d99d27e2dcdab8bc37ca94c83eda26f812a3dee87ac98989f89b3f965 recovery.1g.bin

Example

split

You can use split_text to split text into tokens. split_text returns list of random bytes.

>>> from tally_token import split_text
>>> split_text("Hello, World!")
[b'qQ\xa5\x97\x84\x88\xd7U%\xfb(k\xa1', b'94\xc9\xfb\xeb\xa4\xf7\x02J\x89D\x0f\x80']

merge

You can use merge_text to merge tokens into text. merge_text returns cleartext.

>>> from tally_token import merge_text
>>> merge_text([b'qQ\xa5\x97\x84\x88\xd7U%\xfb(k\xa1', b'94\xc9\xfb\xeb\xa4\xf7\x02J\x89D\x0f\x80'])
'Hello, World!'

split with custom length

>>> from tally_token import split_text, merge_text
>>> split_text("Hello, World!", 5)
[b'N&\xce\\\xbc6dxp\x87\xa8#z', b'\xa3D\\A\xf8\xd1KDX\x1cKx\x87', b'\xffZ\x03\xf5\x92Q\xf52\xc4\x1e\xf2\xf8\x06', b'\xaa\xdd:\x85F\xa1\xcdbp\xf3\xe6P\xe5', b'\xf0\x80\xc7\x01\xff;7;\xf3\x04\x9b\x97?']
>>> merge_text([b'N&\xce\\\xbc6dxp\x87\xa8#z', b'\xa3D\\A\xf8\xd1KDX\x1cKx\x87', b'\xffZ\x03\xf5\x92Q\xf52\xc4\x1e\xf2\xf8\x06', b'\xaa\xdd:\x85F\xa1\xcdbp\xf3\xe6P\xe5', b'\xf0\x80\xc7\x01\xff;7;\xf3\x04\x9b\x97?'])
'Hello, World!'

split with custom encoding

>>> from tally_token import split_text, merge_text
>>> split_text("こんにちは", encoding="CP932")
[b'g\xc3\x12\xeal?\xe5[\x03\xad', b'\xe5r\x90\x1b\xee\xf6g\xe4\x81`']
>>> merge_text([b'g\xc3\x12\xeal?\xe5[\x03\xad', b'\xe5r\x90\x1b\xee\xf6g\xe4\x81`'], encoding="CP932")
'こんにちは'

bytes interface

You can use split_bytes_into and merge_bytes_into to split and merge bytes. This is useful for split binary data.

>>> from tally_token import split_bytes_into, merge_bytes_into
>>> split_bytes_into(b"Hello, World!", 5)
[b'\xc5b\xf4E)\xe1vO8\xff@\xf9\xdd', b'\x84\xb9X#\x85\xf5\xed\xbcM\xc4\xef\xf4\xd3', b'\xb47\xf6\xfa?\x14\xa8`\xc9\xe0\xe5\x87\x14', b'\x1cd\xb4o\xe8I:\xe5\xf6\x13\xe5\x93G', b'\xa1\xed\x82\x9f\x14e)!%\xba\xc3}|']
>>> merge_bytes_into([b'\xc5b\xf4E)\xe1vO8\xff@\xf9\xdd', b'\x84\xb9X#\x85\xf5\xed\xbcM\xc4\xef\xf4\xd3', b'\xb47\xf6\xfa?\x14\xa8`\xc9\xe0\xe5\x87\x14', b'\x1cd\xb4o\xe8I:\xe5\xf6\x13\xe5\x93G', b'\xa1\xed\x82\x9f\x14e)!%\xba\xc3}|'])
b'Hello, World!'

Reference

LICENSE

BSD 3-Clause License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tally_token-0.6.1.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tally_token-0.6.1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file tally_token-0.6.1.tar.gz.

File metadata

  • Download URL: tally_token-0.6.1.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for tally_token-0.6.1.tar.gz
Algorithm Hash digest
SHA256 cd21cc21cfd39bc5603588097d33a973d9f6aab78fba8270ede703ddac895128
MD5 1c0fc8f0a4095a61481fdb1ebf210f3d
BLAKE2b-256 e0308dcca0f188b37b83bc1de12eaae731105fa099a29a1fefad22aadb8b8331

See more details on using hashes here.

File details

Details for the file tally_token-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: tally_token-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for tally_token-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6d0bbff435b14d4578442064336fcbd9ae78263752727450340a7223e14a4cb5
MD5 d8c0810f607771a84193b9861fb87568
BLAKE2b-256 100db64b93c30d2765deaa3be30343dbf05b19842b60f09ce110244c38038f59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page