Skip to main content

A pure Python library supporting Tdb “Text DataBase” format, a plain text human readable typed database storage format superior to CSV.

Project description

Tdb Overview

Tdb “Text DataBase” format is a plain text human readable typed database storage format.

Tdb is an ideal alternative to CSV. A Tdb file can store any number of tables. Every table is named, and every field has a name and a type. Types are not-null by default, but can be nullable if required. The seven supported types include strings which respect all whitespace (including newlines), and which may contain any UTF-8 characters (using XML-escaping conventions), binary (e.g., for images), Booleans, numbers (integer and real), and dates and datetimes.

Tdb libraries are available in Go and Python with a Rust library in development. The Tdb format is designed to be very easy to parse, so creating a Tdb library in virtually any language should be straightforward.

Datatypes

Tdb supports the following seven built-in datatypes.

Type Example(s) Notes
bool F A Tdb reader should also accept 'f', 'N', 'n', 't', 'Y', 'y', '0', '1'
bytes (20AC 65 66 48) There must be an even number of case-insensitive hex digits; whitespace (spaces, newlines, etc.) optional.
date 2022-04-01 Basic ISO8601 YYYY-MM-DD format.
datetime 2022-04-01T16:11:51 ISO8601 YYYY-MM-DDTHH[:MM[:SS]] format; 1-sec resolution no timezone support.
int -192 234 7891409 Standard integers.
real 0.15 0.7e-9 2245.389 Standard and scientific notation.
str <Some text which may include newlines> For &, <, >, use &amp;, &lt;, &gt; respectively.

All fields are not null by default and must contain a valid value of the field's type. To make a field nullable, append ? to its typename, e.g., int?.

Strings may not include &, < or >, so if they are needed, they must be replaced by the XML/HTML escapes &amp;, &lt;, and &gt; respectively. Strings respect any whitespace they contain, including newlines.

Each field value is separated from its neighbor by whitespace, and conventionally records are separated by newlines. However, in practice, since every field in every record must be present (even if only a null value or an empty bytes or string), records may be laid out however you like.

Where whitespace is allowed (or required) it may consist of one or more spaces, tabs, or newlines in any combination.

Examples

CSV

Although widely used, the CSV format is not standardized and has a number of problems. Tdb is a standardized alternative that can distinguish fieldnames from data records, can handle multiline text (including text with commas and quotes) without formality, and can store one—or more—tables in a single Tdb file.

Here's a simple CSV file:

Date,Price,Quantity,ID,Description
"2022-09-21",3.99,2,"CH1-A2","Chisels (pair), 1in & 1¼in"
"2022-10-02",4.49,1,"HV2-K9","Hammer, 2lb"
"2022-10-02",5.89,1,"SX4-D1","Eversure Sealant, 13-floz"

Here's a Tdb equivalent:

[PriceList Date date Price real Quantity int ID str Description str
%
2022-09-21 3.99 2 <CH1-A2> <Chisels (pair), 1in &amp; 1¼in> 
2022-10-02 4.49 1 <HV2-K9> <Hammer, 2lb> 
2022-10-02 5.89 1 <SX4-D1> <Eversure Sealant, 13-floz> 
]

Every table starts with a tablename followed by one or more fields. Each field consists of a fieldname and a type.

Superficially this may not seem much of an improvement on CSV (apart from Tbd's superior string handling and strong typing), but as the next example shows, a Tdb file can contain one or more tables, not just one like CSV.

Database

Database files aren't normally human readable and usually require specialized tools to read and modify their contents. Yet many databases are relatively small (both in size and number of tables), and would be more convenient to work with if human readable. For these, Tdb format provides a viable alternative. For example:

[Customers CID int Company str Address str? Contact str Email str
%
50 <Best People> <123 Somewhere> <John Doe> <j@doe.com> 
19 <Supersuppliers> ? <Jane Doe> <jane@super.com> 
]
[Invoices INUM int CID int Raised_Date date Due_Date date Paid bool Description str?
%
152 50 2022-01-17 2022-02-17 no <COD> 
153 19 2022-01-19 2022-02-19 yes ?
]
[Items IID int INUM int Delivery_Date date Unit_Price real Quantity int Description str
%
1839 152 2022-01-16 29.99 2 <Bales of hay> 
1840 152 2022-01-16 5.98 3 <Straps> 
1620 153 2022-01-19 11.5 1 <Washers (1-in)> 
]

In the Customers table the second customer's Address and in the Invoices table, the second invoice's Description both have nulls as their values. (No other fields may have nulls only these fields are nullable).

Config

Configuration files often consist of key–value pairs or grouped key–value pairs. For example, a .ini file like this:

symbols=latin
[Window]
x=32
y=28
[Colors]
foreground=lightyellow
background=#FFE7FF

could be represented by a .tdb like this:

[config_int key str value int
%
<x> 32
<y> 28
]
[config_str key str value str
%
<foreground> <lightyellow>
<background> <#FFE7FF>
<symbols> <latin>
]

And if grouping were required, like this:

[config_int group str? key str value int
%
<Window> <x> 32
<Window> <y> 28
]
[config_str group str? key str value str
%
<Colors> <foreground> <lightyellow>
<Colors> <background> <#FFE7FF>
? <symbols> <latin>
]

Here, we've allowed group to be null (equivalent to the .ini "General" group), but we could easily have made it not-null and required a group name for all groups.

Minimal Tdb Files

[T f int
%
]

This file has a single table called T which has a single field called f of type int, and no records.

[T f int
%
0
]

This is like the previous table but now with one record containing the value 0.

[T f int?
%
0
?
]

Again like the previous table, but now with two records, the first containing the value 0, and the second containing null which is permitted since the field's type is nullable.

Timezones and Metadata

Tdb does not have direct timezone support. There are three simple solutions for this.

If all the dates in the database are in the same timezone, then one approach is to store all the dates as UTC. Alternatively, add a tiny configuration table with the timezone data, for example:

[Config key str value str?
%
<timezone> <+02:30>
]

If, however, the dates being stored have varying timezones, then add another column specifically for the timezone. Something along these lines:

[Readings meter str reading real when date timezone str
%
<EX194B4> 1932.49 2024-11-17 <-03:00>
<V1938DX> 8492.1 2024-10-30 <+02:30>
]

If comments or metadata are required, simply create an additional table to store this data and add it to the Tdb. For example, use a Config table as shown above.

Libraries

Library Language Homepage
tdb-go Go https://pkg.go.dev/github.com/mark-summerfield/tdb-go
tdb-py Python https://pypi.org/project/tdb-py
tdb-rs Rust https://crates.io/crates/tdb-rs (in development)

We will happily add links to implementations in other languages.

BNF

Tdb files use the UTF-8 encoding. Tdb syntactical elements are all ASCII, so it is possible to read Tdb files as bytes (as the Go library does) or as Unicode characters (as the Python library does). Each Tdb file consists of one or more tables.

TDB         ::= TABLE+
TABLE       ::= OWS '[' OWS TABLEDEF OWS '%' OWS RECORD* OWS ']' OWS
TABLEDEF    ::= IDENFIFIER (RWS FIELDDEF)+ # IDENFIFIER is the tablename
FIELDDEF    ::= IDENFIFIER RWS FIELDTYPE # IDENFIFIER is the fieldname
FIELDTYPE   ::= ('bool' | 'bytes' | 'date' | 'datetime' | 'int' | 'real' | 'str') NULL?
RECORD      ::= OWS VALUE (RWS VALUE)*
VALUE       ::= BOOL | BYTES | DATE | DATETIME | INT | REAL | STR | NULL # NULL is only allowed for nullable field types
BOOL        ::= /[FfTtYyNn01]/
BYTES       ::= '(' (OWS [A-Fa-f0-9]{2})* OWS ')'
DATE        ::= /\d\d\d\d-\d\d-\d\d/  # basic ISO8601 YYYY-MM-DD format
DATETIME    ::= /\d\d\d\d-\d\d-\d\dT\d\d(\d\d(\d\d)?)?/ 
INT         ::= /[-+]?\d+/ 
REAL        ::= ... # standard or scientific notation
STR         ::= /[<][^<>]*?[>]/ # newlines allowed, and &amp; &lt; &gt; supported i.e., XML
NULL        ::= '?'
IDENFIFIER  ::= /[_\p{L}]\w{0,31}/ # Must start with a letter or underscore; may not be a built-in constant
OWS         ::= /[\s\n]*/
RWS         ::= /[\s\n]+/ # in some cases RWS is actually optional

Notes

  • Every field is not null by default and must contain a valid value of the field's type. To make a field nullable, append ? to its typename, e.g., str?; for nullable fields the value must either be one of the field's type (e.g., str) or null ?.
  • A Tdb file must contain at least one table even if it is empty, i.e., has no records.
  • A Tdb writer should always write bools as F or T; but a Tdb reader should accept any of F, f, N, n, 0, for false, and any of T, t, Y, y, 1, for true.
  • Within any .tdb file each tablename must be unique, and within each table each fieldname must be unique.
  • No tablename or fieldname (i.e., no identifier) may be the same as a built-in constant or bool value:
    bool, bytes, date, datetime, f, F, int, n, N, real, str, t, T, y, Y

Supplementary

Vim Support

If you use the vim editor, simple color syntax highlighting is available. Copy tdb.vim into your $VIM/syntax/ folder and add this line (or similar) to your .vimrc or .gvimrc file:

au BufRead,BufNewFile,BufEnter *.tdb set ft=tdb|set expandtab|set textwidth=80

Tdb Logo

tdb logo


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdb-py-0.9.5.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tdb_py-0.9.5-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file tdb-py-0.9.5.tar.gz.

File metadata

  • Download URL: tdb-py-0.9.5.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for tdb-py-0.9.5.tar.gz
Algorithm Hash digest
SHA256 ac50eee8f71961331bcc2e00e310e82eb606ee3636bd7ea77a489fe808788036
MD5 2bbea1253bf1350dbdd7e20d736efb3e
BLAKE2b-256 10aae725d8f9e8afe27f7d3ca1a80d1b7a9ed94a205acd0af6390c99ed194831

See more details on using hashes here.

File details

Details for the file tdb_py-0.9.5-py3-none-any.whl.

File metadata

  • Download URL: tdb_py-0.9.5-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for tdb_py-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5a908e65e9d2f64440014f2298d841f9de05a7b638ea4324fa4ba2ef78143a4b
MD5 b95e1ff39b94a9809c30bfa72f4d20ec
BLAKE2b-256 0d778034d4451ab9ebbb415d21b7754cf630160f91139ee528827a5517ec7e78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page