Skip to main content

String-separated values with user-defined multi-character delimiters

Project description

TokenSplit — Token-Separated Values

A lightweight Python package for reading and writing .toks files: a plain-text tabular format where you choose your own multi-character delimiter string.


Why?

CSV uses a single character (,) as a separator, which means commas in your data need escaping or quoting.
Toks lets you pick any string — /---/, :::, <<SEP>> — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.


File format

/---/
Alice/---/30/---/Engineer/---/
Bob/---/25/---/Designer/---/
  • Line 1 — the delimiter string (written automatically by the writer)
  • Every other line — values separated by the delimiter, with the line ending on <delimiter><newline>

Newlines inside a value are preserved because rows end only on the <delimiter><newline> sequence, not on bare newlines.


Installation

pip install tokensplit           # once published to PyPI
# or, from source:
pip install .

Quick start

Writing

import tokensplit

# Convenience function
tokensplit.write("people.toks", [
    ["name", "age", "role"],
    ["Alice", "30", "Engineer"],
    ["Bob",   "25", "Designer"],
], delimiter="/---/")
# Streaming writer — useful for large files
with open("people.toks", "w") as f:
    writer = tokensplit.ToksWriter(f, delimiter="/---/")
    writer.writerow(["name", "age", "role"])   # header
    writer.writerow(["Alice", "30", "Engineer"])
    writer.writerow(["Bob",   "25", "Designer"])

Reading

import tokensplit

# Convenience function — returns list of rows
rows = tokensplit.read("people.toks")
# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]

# Streaming reader — one row at a time (memory-efficient)
with open("people.toks") as f:
    reader = tokensplit.ToksReader(f)
    print("delimiter:", reader.delimiter)   # "/---/"
    for row in reader:
        print(row)

Choosing a delimiter

Any non-empty string without a newline character works. Good choices:

Delimiter Good when data contains…
/---/ General text
||| Paths, URLs
<<<>>> Code snippets
,,,, Numeric CSVs being converted
::: Short labels / IDs

Two rules enforced by the writer:

  1. A value must not contain the delimiter string.
  2. A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value "aa" with delimiter "aaa" would produce "aaaaa" which embeds an extra delimiter). A ValueError is raised in both cases.

API reference

tokensplit.write(filepath, rows, delimiter)

Write rows (list of lists of strings) to filepath.

tokensplit.read(filepath) → List[List[str]]

Read all rows from filepath. Returns a list of lists of strings.

tokensplit.ToksWriter(file_obj, delimiter)

Streaming writer. Call .writerow(row) or .writerows(rows).
The delimiter is written to line 1 of the file on construction.

tokensplit.ToksReader(file_obj)

Streaming reader. Iterate with for row in reader.
.delimiter attribute exposes the delimiter read from line 1.


Reading algorithm

The reader uses a forward-only sliding window of exactly len(delimiter) characters:

content:   h e l l o / - - - / w o r l d / - - - / \n
window:    [     5     ]
                  → slides one character at a time
                        match! → emit token, jump window past delimiter
  • Time: O(n) — every character is visited once; one slice emitted per match
  • Extra space: O(d) — only the current window lives in memory beyond the content string
  • No regex, no str.split, no backtracking

Running tests

python -m pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokensplit-0.1.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokensplit-0.1.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file tokensplit-0.1.1.tar.gz.

File metadata

  • Download URL: tokensplit-0.1.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokensplit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d3487b9fcc193070098a9dbca6da6e2c420b7cb774136b89906b4817ceaa3b9a
MD5 f58124138fa65d89d37a2188d297c6f3
BLAKE2b-256 6facb9c5c14bd715fb772d2c929871dd186f199c3f3c819e81616e3d3fd7a998

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokensplit-0.1.1.tar.gz:

Publisher: publish.yml on nGubbins/tokensplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokensplit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tokensplit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokensplit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0387ddea97b158b059af01acf222e1eace95d169f714cec57d554457fe57c75f
MD5 0d612d63c365cdde007a747028e7efcb
BLAKE2b-256 ecc7fe1d2f2ac8565b19ca2c37c94412de803a2298ad1de259330fdb86ec6389

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokensplit-0.1.1-py3-none-any.whl:

Publisher: publish.yml on nGubbins/tokensplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page