Skip to main content

A simple format for typing TSVs with an implementation in Python 3

Project description


Typed TSV: A simple format for typing TSVs with an implementation in Python 3.

Available on pypi:

Install with: pip install typedtsv

See code and leave feedback here:


JSON, YAML, TOML and other simple formats aren't built for list/table like sets of data.

YAML is particularly slow due to its expansive featureset and JSON, being that is for single objects and not collections, is not chunkable. I once stored all PyPI package info in a YAML file and reading it back out was going to take half a day. Using a dead-simple newline-delimited JSON format made parsing take seconds.

Newline-delimited JSON is convenient with little chance of making mistakes in parsing and good performance. The downsides are the types supported are a bit too limited (no int vs float), and it is also not easily human readable or editable.

TOML is particularly targeted towards configuration files and similarly parses results in a single dictionary object rather than a collection.

CSV/TSV formats have too much ambiguity resulting in repetitive custom parsing logic contained outside the file itself. CSV quote escaping can also lead to poor parsing performance.


  • Be simple
  • Be fast
  • Be easily parallelized
  • Be a better alternative to CSV/TSV/JSON and simple uses of YAML
  • Support open data and data sharing/archival. Push information about a dataset into the data file itself for future reproducibility

Use Cases in Mind

  • Database-agnostic, program-agnostic simple file format for open data
  • A quick go-to serialization format for sharing reproducible data science datasets
  • Easily-created, easily-editable, easily-understood database fixtures for tests


  • Unlimited extensibility a la YAML
  • Config files. Focus is on lists of objects/tabular data


Format is a normal TSV except the header rows uses a colon format to annotate the type:


For example:

# I'm a comment and will be ignored
url:str    n_times:int   score:float 5   1.6 99  9.9

Initial pass centered around Python's basic types plus JSON. Current valid types are:

Type Notes
bool Valid values: true, false, t, f, yes, no, y, n, 1, 0
str Newlines, tabs, \, and # must be escaped
datetime '2011-01-01 00:00:00' Without timezone assumes UTC
null All types are nullable with value 'null'. To get literal string 'null', use '\null'

Comments are supported, just prefix with #. Escape actual # in a string with a single backslash '\#'.

Row separators use '\n' only. Windows line breaks, '\r\n' are not valid.

We'll never allow quoted '\n' because this would make the file difficult to chunk and thus make it difficult to parallelize reading.


  • In Python, you need to be careful about opening files that may contain Windows newlines:
infile = open('data.ttsv', 'r', newline='\n')   # must set newline='\n' because default for newline is '\n' or '\r' or '\r\n'
  • typedtsv.dumps can infer column types from the first row of your data but not if there are any null's. In that case, use the regular OrderedDict method to define column names and types


  • Add a boolean type
  • Add nulls
  • Add a datetime/date/time type: need to avoid ambiguity yet support common uses
  • Ergonomics: optionally read and dump single lists of data rather than dealing with a list of lists
  • Support units annotations such as degrees F, meters/second using similar using same syntax as F#:
  • Maybe: extend format to support column comments / other common metadata
  • Maybe: support array and map types for compatibility with Postgres
  • Maybe: Support date, time, and/or timeinterval types


Make sure you have Poetry installed:

git clone
cd typedtsv
poetry install
poetry shell

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typedtsv-0.9.1.tar.gz (9.6 kB view hashes)

Uploaded source

Built Distribution

typedtsv-0.9.1-py3-none-any.whl (23.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page