Skip to main content

A library to split data into tokens

Project description

python-tally-token

Python PyPI version shields.io License codecov

What is this?

tally-token is a Python library for split data into tokens with same length.

Tally is a historical object for prove something by splitting wood into tokens and matching tokens.

Medieval English split tally stick (front and reverse view). The stick is notched and inscribed to record a debt owed to the rural dean of Preston Candover, Hampshire, of a tithe of 20d each on 32 sheep, amounting to a total sum of £2 13s. 4d.

Usage

Install

$ pip install tally-token

CLI Usage

$ tally-token --help
usage: tally-token [-h] {split,merge} ...

positional arguments:
  {split,merge}  Commands: split: split a file into multiple files merge: merge multiple files into a fileExample: tally-token split example.bin
                 example.bin.1 example.bin.2 example.bin.3 tally-token merge example-merged.bin example.bin.1 example.bin.2 example.bin.3

options:
  -h, --help     show this help message and exit

split

You can use split to split a file into multiple files.

$ tally-token split something.bin split-1.bin split-2.bin split-3.bin

merge

You can use merge to merge multiple files into a file.

$ tally-token merge merged.bin split-1.bin split-2.bin split-3.bin

Large files

Nothing special. You can split and merge large file.

$ dd if=/dev/urandom of=original.1g.bin bs=1G count=1
$ tally-token split original.1g.bin split-1.bin split-2.bin split-3.bin
$ shasum -a 256 original.1g.bin
> 736a344d99d27e2dcdab8bc37ca94c83eda26f812a3dee87ac98989f89b3f965 original.1g.bin
$ tally-token merge recovery.1g.bin split-1.bin split-2.bin split-3.bin
$ shasum -a 256 recovery.1g.bin
> 736a344d99d27e2dcdab8bc37ca94c83eda26f812a3dee87ac98989f89b3f965 recovery.1g.bin

Example

split

You can use split_text to split text into tokens. split_text returns list of random bytes.

>>> from tally_token import split_text
>>> split_text("Hello, World!")
[b'qQ\xa5\x97\x84\x88\xd7U%\xfb(k\xa1', b'94\xc9\xfb\xeb\xa4\xf7\x02J\x89D\x0f\x80']

merge

You can use merge_text to merge tokens into text. merge_text returns cleartext.

>>> from tally_token import merge_text
>>> merge_text([b'qQ\xa5\x97\x84\x88\xd7U%\xfb(k\xa1', b'94\xc9\xfb\xeb\xa4\xf7\x02J\x89D\x0f\x80'])
'Hello, World!'

split with custom length

>>> from tally_token import split_text, merge_text
>>> split_text("Hello, World!", 5)
[b'N&\xce\\\xbc6dxp\x87\xa8#z', b'\xa3D\\A\xf8\xd1KDX\x1cKx\x87', b'\xffZ\x03\xf5\x92Q\xf52\xc4\x1e\xf2\xf8\x06', b'\xaa\xdd:\x85F\xa1\xcdbp\xf3\xe6P\xe5', b'\xf0\x80\xc7\x01\xff;7;\xf3\x04\x9b\x97?']
>>> merge_text([b'N&\xce\\\xbc6dxp\x87\xa8#z', b'\xa3D\\A\xf8\xd1KDX\x1cKx\x87', b'\xffZ\x03\xf5\x92Q\xf52\xc4\x1e\xf2\xf8\x06', b'\xaa\xdd:\x85F\xa1\xcdbp\xf3\xe6P\xe5', b'\xf0\x80\xc7\x01\xff;7;\xf3\x04\x9b\x97?'])
'Hello, World!'

split with custom encoding

>>> from tally_token import split_text, merge_text
>>> split_text("こんにちは", encoding="CP932")
[b'g\xc3\x12\xeal?\xe5[\x03\xad', b'\xe5r\x90\x1b\xee\xf6g\xe4\x81`']
>>> merge_text([b'g\xc3\x12\xeal?\xe5[\x03\xad', b'\xe5r\x90\x1b\xee\xf6g\xe4\x81`'], encoding="CP932")
'こんにちは'

bytes interface

You can use split_bytes_into and merge_bytes_into to split and merge bytes. This is useful for split binary data.

>>> from tally_token import split_bytes_into, merge_bytes_into
>>> split_bytes_into(b"Hello, World!", 5)
[b'\xc5b\xf4E)\xe1vO8\xff@\xf9\xdd', b'\x84\xb9X#\x85\xf5\xed\xbcM\xc4\xef\xf4\xd3', b'\xb47\xf6\xfa?\x14\xa8`\xc9\xe0\xe5\x87\x14', b'\x1cd\xb4o\xe8I:\xe5\xf6\x13\xe5\x93G', b'\xa1\xed\x82\x9f\x14e)!%\xba\xc3}|']
>>> merge_bytes_into([b'\xc5b\xf4E)\xe1vO8\xff@\xf9\xdd', b'\x84\xb9X#\x85\xf5\xed\xbcM\xc4\xef\xf4\xd3', b'\xb47\xf6\xfa?\x14\xa8`\xc9\xe0\xe5\x87\x14', b'\x1cd\xb4o\xe8I:\xe5\xf6\x13\xe5\x93G', b'\xa1\xed\x82\x9f\x14e)!%\xba\xc3}|'])
b'Hello, World!'

Reference

LICENSE

BSD 3-Clause License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tally-token-0.3.1.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

tally_token-0.3.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file tally-token-0.3.1.tar.gz.

File metadata

  • Download URL: tally-token-0.3.1.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for tally-token-0.3.1.tar.gz
Algorithm Hash digest
SHA256 5c600369e553a63060e4a6ee57a1d1af32a885672811e2cd4d2e1546988d19d2
MD5 f4c146ac02f0d83dc8157361a06ffc9a
BLAKE2b-256 5d2546b231f0d3af8ae2fe063a5c3acf66a179178fc761b727ae03ff1aaa66f3

See more details on using hashes here.

File details

Details for the file tally_token-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: tally_token-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for tally_token-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3c822a7a7512754eec53f1dd9d40091ded7bc2f73b97ae32adfc719294605ed
MD5 8c5da7e4aaf54541a0079e378ef60e3e
BLAKE2b-256 09fc30507f4b24fd3b4e29c0a9d1ce4c14152fcdc61e9efaf97262c192ee61e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page