Skip to main content

String-separated values with user-defined multi-character delimiters

Project description

TokenSplit — Token-Separated Values

A lightweight Python package for reading and writing .toks files: a plain-text tabular format where you choose your own multi-character delimiter string.


Why?

CSV uses a single character (,) as a separator, which means commas in your data need escaping or quoting.
Toks lets you pick any string — /---/, :::, <<SEP>> — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.


File format

/---/
Alice/---/30/---/Engineer/---/
Bob/---/25/---/Designer/---/
  • Line 1 — the delimiter string (written automatically by the writer)
  • Every other line — values separated by the delimiter, with the line ending on <delimiter><newline>

Newlines inside a value are preserved because rows end only on the <delimiter><newline> sequence, not on bare newlines.


Installation

pip install tokensplit           # once published to PyPI
# or, from source:
pip install .

Quick start

Writing

import tokensplit

# Convenience function
tokensplit.write("people.toks", [
    ["name", "age", "role"],
    ["Alice", "30", "Engineer"],
    ["Bob",   "25", "Designer"],
], delimiter="/---/")
# Streaming writer — useful for large files
with open("people.toks", "w") as f:
    writer = tokensplit.ToksWriter(f, delimiter="/---/")
    writer.writerow(["name", "age", "role"])   # header
    writer.writerow(["Alice", "30", "Engineer"])
    writer.writerow(["Bob",   "25", "Designer"])

Reading

import tokensplit

# Convenience function — returns list of rows
rows = tokensplit.read("people.toks")
# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]

# Streaming reader — one row at a time (memory-efficient)
with open("people.toks") as f:
    reader = tokensplit.ToksReader(f)
    print("delimiter:", reader.delimiter)   # "/---/"
    for row in reader:
        print(row)

Choosing a delimiter

Any non-empty string without a newline character works. Good choices:

Delimiter Good when data contains…
/---/ General text
||| Paths, URLs
<<<>>> Code snippets
,,,, Numeric CSVs being converted
::: Short labels / IDs

Two rules enforced by the writer:

  1. A value must not contain the delimiter string.
  2. A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value "aa" with delimiter "aaa" would produce "aaaaa" which embeds an extra delimiter). A ValueError is raised in both cases.

API reference

tokensplit.write(filepath, rows, delimiter)

Write rows (list of lists of strings) to filepath.

tokensplit.read(filepath) → List[List[str]]

Read all rows from filepath. Returns a list of lists of strings.

tokensplit.ToksWriter(file_obj, delimiter)

Streaming writer. Call .writerow(row) or .writerows(rows).
The delimiter is written to line 1 of the file on construction.

tokensplit.ToksReader(file_obj)

Streaming reader. Iterate with for row in reader.
.delimiter attribute exposes the delimiter read from line 1.


Reading algorithm

The reader uses a forward-only sliding window of exactly len(delimiter) characters:

content:   h e l l o / - - - / w o r l d / - - - / \n
window:    [     5     ]
                  → slides one character at a time
                        match! → emit token, jump window past delimiter
  • Time: O(n) — every character is visited once; one slice emitted per match
  • Extra space: O(d) — only the current window lives in memory beyond the content string
  • No regex, no str.split, no backtracking

Running tests

python -m pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokensplit-0.1.2.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokensplit-0.1.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file tokensplit-0.1.2.tar.gz.

File metadata

  • Download URL: tokensplit-0.1.2.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokensplit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9e31cf9bcc25b5e50b4d7494c6e0f95fba6bf0b1296295ed4bdd4e62067e37b8
MD5 bdf74debb0a9588c69455b8c09d8c923
BLAKE2b-256 1144b0ec9ed302a20b89c37c06321213b42f90ff07b3c06c0e4363054dd42cd6

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokensplit-0.1.2.tar.gz:

Publisher: publish.yml on nGubbins/tokensplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokensplit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tokensplit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokensplit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01996df11030a3bb7bd13d80ab3b0a25ee2a11a85ab6aa3219c89b3705f510e6
MD5 f2cba71f145fab518e2df169d2f9b578
BLAKE2b-256 40054e05636ad1a312d89d4df68ed7239c8f679605a27f1720743ac35b471639

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokensplit-0.1.2-py3-none-any.whl:

Publisher: publish.yml on nGubbins/tokensplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page