Skip to main content

String-separated values with user-defined multi-character delimiters

Project description

TokenSplit — Token-Separated Values

A lightweight Python package for reading and writing .toks files: a plain-text tabular format where you choose your own multi-character delimiter string.


Why?

CSV uses a single character (,) as a separator, which means commas in your data need escaping or quoting.
Toks lets you pick any string — /---/, :::, <<SEP>> — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.


File format

/---/
Alice/---/30/---/Engineer/---/
Bob/---/25/---/Designer/---/
  • Line 1 — the delimiter string (written automatically by the writer)
  • Every other line — values separated by the delimiter, with the line ending on <delimiter><newline>

Newlines inside a value are preserved because rows end only on the <delimiter><newline> sequence, not on bare newlines.


Installation

pip install tokensplit           # once published to PyPI
# or, from source:
pip install .

Quick start

Writing

import tokensplit

# Convenience function
tokensplit.write("people.toks", [
    ["name", "age", "role"],
    ["Alice", "30", "Engineer"],
    ["Bob",   "25", "Designer"],
], delimiter="/---/")
# Streaming writer — useful for large files
with open("people.toks", "w") as f:
    writer = tokensplit.ToksWriter(f, delimiter="/---/")
    writer.writerow(["name", "age", "role"])   # header
    writer.writerow(["Alice", "30", "Engineer"])
    writer.writerow(["Bob",   "25", "Designer"])

Reading

import tokensplit

# Convenience function — returns list of rows
rows = tokensplit.read("people.toks")
# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]

# Streaming reader — one row at a time (memory-efficient)
with open("people.toks") as f:
    reader = tokensplit.ToksReader(f)
    print("delimiter:", reader.delimiter)   # "/---/"
    for row in reader:
        print(row)

Choosing a delimiter

Any non-empty string without a newline character works. Good choices:

Delimiter Good when data contains…
/---/ General text
||| Paths, URLs
<<<>>> Code snippets
,,,, Numeric CSVs being converted
::: Short labels / IDs

Two rules enforced by the writer:

  1. A value must not contain the delimiter string.
  2. A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value "aa" with delimiter "aaa" would produce "aaaaa" which embeds an extra delimiter). A ValueError is raised in both cases.

API reference

tokensplit.write(filepath, rows, delimiter)

Write rows (list of lists of strings) to filepath.

tokensplit.read(filepath) → List[List[str]]

Read all rows from filepath. Returns a list of lists of strings.

tokensplit.ToksWriter(file_obj, delimiter)

Streaming writer. Call .writerow(row) or .writerows(rows).
The delimiter is written to line 1 of the file on construction.

tokensplit.ToksReader(file_obj)

Streaming reader. Iterate with for row in reader.
.delimiter attribute exposes the delimiter read from line 1.


Reading algorithm

The reader uses a forward-only sliding window of exactly len(delimiter) characters:

content:   h e l l o / - - - / w o r l d / - - - / \n
window:    [     5     ]
                  → slides one character at a time
                        match! → emit token, jump window past delimiter
  • Time: O(n) — every character is visited once; one slice emitted per match
  • Extra space: O(d) — only the current window lives in memory beyond the content string
  • No regex, no str.split, no backtracking

Running tests

python -m pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokensplit-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokensplit-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file tokensplit-0.1.0.tar.gz.

File metadata

  • Download URL: tokensplit-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for tokensplit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22520e98d10d3a1ee02b6a0d028c65d67ad286c93caef7bd0f1ca58a8cb5e45e
MD5 3961d2bda01f2d9b3b200e36692d084a
BLAKE2b-256 6b91fbac7b217d0c46de0f3b712d9cd84a134284468c10824533a9f41bac1651

See more details on using hashes here.

File details

Details for the file tokensplit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokensplit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for tokensplit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9c454b6dd48efdd62a069303567b116d4ece20d98a554ba88388ebcf32b2a95
MD5 b95cfe9d32bb4cb8a512242ce546d147
BLAKE2b-256 f2b985064b3820a91a34b21273b9955961ae879b7132229c2c1e79416c015914

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page