String-separated values with user-defined multi-character delimiters
Project description
TokenSplit — Token-Separated Values
A lightweight Python package for reading and writing .toks files: a plain-text tabular format where you choose your own multi-character delimiter string.
Why?
CSV uses a single character (,) as a separator, which means commas in your data need escaping or quoting.
Toks lets you pick any string — /---/, :::, <<SEP>> — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.
File format
/---/
Alice/---/30/---/Engineer/---/
Bob/---/25/---/Designer/---/
- Line 1 — the delimiter string (written automatically by the writer)
- Every other line — values separated by the delimiter, with the line ending on
<delimiter><newline>
Newlines inside a value are preserved because rows end only on the <delimiter><newline> sequence, not on bare newlines.
Installation
pip install tokensplit # once published to PyPI
# or, from source:
pip install .
Quick start
Writing
import tokensplit
# Convenience function
tokensplit.write("people.toks", [
["name", "age", "role"],
["Alice", "30", "Engineer"],
["Bob", "25", "Designer"],
], delimiter="/---/")
# Streaming writer — useful for large files
with open("people.toks", "w") as f:
writer = tokensplit.ToksWriter(f, delimiter="/---/")
writer.writerow(["name", "age", "role"]) # header
writer.writerow(["Alice", "30", "Engineer"])
writer.writerow(["Bob", "25", "Designer"])
Reading
import tokensplit
# Convenience function — returns list of rows
rows = tokensplit.read("people.toks")
# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]
# Streaming reader — one row at a time (memory-efficient)
with open("people.toks") as f:
reader = tokensplit.ToksReader(f)
print("delimiter:", reader.delimiter) # "/---/"
for row in reader:
print(row)
Choosing a delimiter
Any non-empty string without a newline character works. Good choices:
| Delimiter | Good when data contains… |
|---|---|
/---/ |
General text |
||| |
Paths, URLs |
<<<>>> |
Code snippets |
,,,, |
Numeric CSVs being converted |
::: |
Short labels / IDs |
Two rules enforced by the writer:
- A value must not contain the delimiter string.
- A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value
"aa"with delimiter"aaa"would produce"aaaaa"which embeds an extra delimiter). AValueErroris raised in both cases.
API reference
tokensplit.write(filepath, rows, delimiter)
Write rows (list of lists of strings) to filepath.
tokensplit.read(filepath) → List[List[str]]
Read all rows from filepath. Returns a list of lists of strings.
tokensplit.ToksWriter(file_obj, delimiter)
Streaming writer. Call .writerow(row) or .writerows(rows).
The delimiter is written to line 1 of the file on construction.
tokensplit.ToksReader(file_obj)
Streaming reader. Iterate with for row in reader.
.delimiter attribute exposes the delimiter read from line 1.
Reading algorithm
The reader uses a forward-only sliding window of exactly len(delimiter) characters:
content: h e l l o / - - - / w o r l d / - - - / \n
window: [ 5 ]
→ slides one character at a time
match! → emit token, jump window past delimiter
- Time: O(n) — every character is visited once; one slice emitted per match
- Extra space: O(d) — only the current window lives in memory beyond the content string
- No regex, no
str.split, no backtracking
Running tests
python -m pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokensplit-0.1.2.tar.gz.
File metadata
- Download URL: tokensplit-0.1.2.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e31cf9bcc25b5e50b4d7494c6e0f95fba6bf0b1296295ed4bdd4e62067e37b8
|
|
| MD5 |
bdf74debb0a9588c69455b8c09d8c923
|
|
| BLAKE2b-256 |
1144b0ec9ed302a20b89c37c06321213b42f90ff07b3c06c0e4363054dd42cd6
|
Provenance
The following attestation bundles were made for tokensplit-0.1.2.tar.gz:
Publisher:
publish.yml on nGubbins/tokensplit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokensplit-0.1.2.tar.gz -
Subject digest:
9e31cf9bcc25b5e50b4d7494c6e0f95fba6bf0b1296295ed4bdd4e62067e37b8 - Sigstore transparency entry: 1236060758
- Sigstore integration time:
-
Permalink:
nGubbins/tokensplit@509863653189af00d9b9f2d782530704f57e20c7 -
Branch / Tag:
refs/tags/Releasev0.1.2 - Owner: https://github.com/nGubbins
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@509863653189af00d9b9f2d782530704f57e20c7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tokensplit-0.1.2-py3-none-any.whl.
File metadata
- Download URL: tokensplit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01996df11030a3bb7bd13d80ab3b0a25ee2a11a85ab6aa3219c89b3705f510e6
|
|
| MD5 |
f2cba71f145fab518e2df169d2f9b578
|
|
| BLAKE2b-256 |
40054e05636ad1a312d89d4df68ed7239c8f679605a27f1720743ac35b471639
|
Provenance
The following attestation bundles were made for tokensplit-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on nGubbins/tokensplit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokensplit-0.1.2-py3-none-any.whl -
Subject digest:
01996df11030a3bb7bd13d80ab3b0a25ee2a11a85ab6aa3219c89b3705f510e6 - Sigstore transparency entry: 1236060793
- Sigstore integration time:
-
Permalink:
nGubbins/tokensplit@509863653189af00d9b9f2d782530704f57e20c7 -
Branch / Tag:
refs/tags/Releasev0.1.2 - Owner: https://github.com/nGubbins
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@509863653189af00d9b9f2d782530704f57e20c7 -
Trigger Event:
release
-
Statement type: