Skip to main content

High-performance parser and generator for PostgreSQL-compatible tab-separated values (TSV)

Project description

Parse and generate tab-separated values (TSV) data

Tab-separated values (TSV) is a simple and popular format for data storage, data transfer, exporting data from and importing data to relational databases. For example, PostgreSQL COPY moves data between PostgreSQL tables and standard file-system files or in-memory stores, and its text format (a text file with one line per table row) is a generic version of TSV. Meanwhile, packages like asyncpg help efficiently insert, update or query data in bulk with binary data transfer between Python and PostgreSQL.

This package offers a high-performance alternative to convert data between a TSV text file and Python objects. The parser can read a TSV record into a Python tuple consisting of built-in Python types, one for each field. The generator can produce a TSV record from a tuple.

Quick start

from tsv.helper import Parser

# specify the column structure
parser = Parser(fields=(bytes, date, datetime, float, int, str, UUID, bool))

# read and parse an entire file
with open(tsv_path, "rb") as f:
    py_records = parser.parse_file(f)

# read and parse a file line by line
with open(tsv_path, "rb") as f:
    for line in f:
        py_record = parser.parse_line(line)

TSV format

Text format is a simple tabular format in which each record (table row) occupies a single line.

  • Output always begins with a header row, which lists data field names.
  • Fields (table columns) are delimited by tab characters.
  • Non-printable characters and special values are escaped with backslash (\), as shown below:
Escape Interpretation
\N NULL value
\0 NUL character (ASCII 0)
\b Backspace (ASCII 8)
\f Form feed (ASCII 12)
\n Newline (ASCII 10)
\r Carriage return (ASCII 13)
\t Tab (ASCII 9)
\v Vertical tab (ASCII 11)
\\ Backslash (single character)

This format allows data to be easily imported into a database engine, e.g. with PostgreSQL COPY.

Output in this format is transmitted as media type text/plain or text/tab-separated-values in UTF-8 encoding.

Parser

The parser understands the following Python types:

  • None. This special value is returned for the TSV escape sequence \N.
  • bool. A literal true or false is converted into a boolean value.
  • bytes. TSV escape sequences are reversed before the data is passed to Python as a bytes object. NUL bytes are permitted.
  • datetime. The input has to comply with RFC 3339 and ISO 8601. The timezone must be UTC (a.k.a. suffix Z).
  • date. The input has to conform to the format YYYY-MM-DD.
  • time. The input has to conform to the format hh:mm:ssZ with no fractional seconds, or hh:mm:ss.ffffffZ with fractional seconds. Fractional seconds allow up to 6 digits of precision.
  • float. Interpreted as double precision floating point numbers.
  • int. Arbitrary-length integers are allowed.
  • str. TSV escape sequences are reversed before the data is passed to Python as a str. NUL bytes are not allowed.
  • uuid.UUID. The input has to comply with RFC 4122, or be a string of 32 hexadecimal digits.
  • decimal.Decimal. Interpreted as arbitrary precision decimal numbers.
  • ipaddress.IPv4Address.
  • ipaddress.IPv6Address.
  • list and dict, which are understood as JSON, and invoke the equivalent of json.loads to parse a serialized JSON string.

The backslash character \ is both a TSV and a JSON escape sequence initiator. When JSON data is written to TSV, several backslash characters may be needed, e.g. \\n in a quoted JSON string translates to a single newline character. First, \\ in \\n is understood as an escape sequence by the TSV parser to produce a single \ character followed by an n character, and in turn \n is understood as a single newline embedded in a JSON string by the JSON parser. Specifically, you need four consecutive backslash characters in TSV to represent a single backslash in a JSON quoted string.

Internally, the implementation uses AVX2 instructions to

  • parse RFC 3339 date-time strings into Python datetime objects,
  • parse RFC 4122 UUID strings or 32-digit hexadecimal strings into Python UUID objects,
  • and find \t delimiters between fields in a line.

For parsing integers up to the range of the long type, the parser calls the C standard library function strtol.

For parsing IPv4 and IPv6 addresses, the parser calls the C function inet_pton in libc or Windows Sockets (WinSock2).

If installed, the parser employs orjson to improve parsing speed of nested JSON structures. If not available, the library falls back to the built-in JSON decoder.

Date-time format

YYYY-MM-DDThh:mm:ssZ
YYYY-MM-DDThh:mm:ss.fZ
YYYY-MM-DDThh:mm:ss.ffZ
YYYY-MM-DDThh:mm:ss.fffZ
YYYY-MM-DDThh:mm:ss.ffffZ
YYYY-MM-DDThh:mm:ss.fffffZ
YYYY-MM-DDThh:mm:ss.ffffffZ

Date format

YYYY-MM-DD

Time format

hh:mm:ssZ
hh:mm:ss.fZ
hh:mm:ss.ffZ
hh:mm:ss.fffZ
hh:mm:ss.ffffZ
hh:mm:ss.fffffZ
hh:mm:ss.ffffffZ

Performance

Depending on the field types, tsv2py is up to 7 times faster to parse TSV records than a functionally equivalent Python implementation based on the Python standard library. Savings in execution time are more substantial for dates, UUIDs and longer strings with special characters (up to 90% savings), and they are more moderate for simple types like small integers (approx. 60% savings).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsv2py-0.6.2.tar.gz (24.3 kB view details)

Uploaded Source

Built Distributions

tsv2py-0.6.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.1 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.6.2-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (19.5 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.6.2-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (19.5 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.6.2-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (19.5 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.2-cp38-abi3-win_amd64.whl (19.2 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tsv2py-0.6.2-cp38-abi3-musllinux_1_1_x86_64.whl (43.5 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.1+ x86-64

tsv2py-0.6.2-cp38-abi3-musllinux_1_1_i686.whl (42.5 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.1+ i686

tsv2py-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.8 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tsv2py-0.6.2-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (41.1 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.2-cp38-abi3-macosx_10_9_universal2.whl (24.3 kB view details)

Uploaded CPython 3.8+ macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file tsv2py-0.6.2.tar.gz.

File metadata

  • Download URL: tsv2py-0.6.2.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.6

File hashes

Hashes for tsv2py-0.6.2.tar.gz
Algorithm Hash digest
SHA256 25dedf0dde190bb730de5a17c1f42b162bd867be07715a8e7e581c4585369a33
MD5 7e6d9b6d81a4de52c1c801499ef90230
BLAKE2b-256 bb8bf68d96772cded2b5f513e125e99cd51c38106597aef82f9d4769082b9d37

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4d5edcb0f7eac9f6707d6ba0e3f176371e79c66fc79665b11452904a141533a5
MD5 ddd96399c28473925eca3c136f69f0e8
BLAKE2b-256 2ccf1c953e1b6da5f6595e81f3d22c8e3f0645f5304c113913d2b0b57fc37ab6

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 54141f4f58c4632e40f124f51e2f26a3e9a5da803e9fed35fee2c8d2eb172b1f
MD5 d1cfc0c176c4345e000dee24af2f6778
BLAKE2b-256 fcc9e44016763189774aff75839c5cea76d59556d3802bcf4adf12e4adaa5128

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 64ef7d16c4e264774403ed002a9302efb3dfe5916d765354d30974cc323f6e34
MD5 3e483677a9eecd4afc977f940fec1dbe
BLAKE2b-256 d5de43b5dbde5a76b8386bd8f76183c2dbf2ba72e853f75f9d0ee0da274dcc2d

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 82dbfbe485638e61323e3d249728d31f5ddf89cb481f39059266ddbc041bc8ff
MD5 dddc7a6952b7858642228c9717af5f2d
BLAKE2b-256 65e153349532d133792f83eda3c1f593e655b5dcfa0d8360302f58a4b1a994e2

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 86a3ef64fb42c7c751857a43b49b0cb15cb588cb7789016d0bbc62ef966fe96b
MD5 05b21be2bd6bbb2c80b253de32748c82
BLAKE2b-256 1b120aa59b7cbac161d6cc70f81d9b22525e7181e5cec75975c3525d58a9e88f

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 afffbd041399016fd4ccf6ce3708b57413f1f6ea2ab887f5df545a36d502e091
MD5 5e78f69570db9d548d1d39a25abbedc9
BLAKE2b-256 91ff698d0951ed58976e938536a86e93d8da0264be78702f112c64633b9186bb

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: tsv2py-0.6.2-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.2

File hashes

Hashes for tsv2py-0.6.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c201e52f037626019e5f2b7f2cdc138adcde4f6731e8edf94f16eb53fa42e56f
MD5 b722c145a77efda33d8a3b3226a01306
BLAKE2b-256 38ac0662b44b7fc6f50561dfb05da00827e64fc3be3c2bd53fa819c6714af3d9

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 fa769f0a2b5d1d0cfe4567b65e0441596231f55892fc5e42481824aad15e5703
MD5 26d328e4ac737586ca1f97c25c7016cb
BLAKE2b-256 54db8e73be3eac0cbffb00f20ec38b8ac184f307935e26f9ea76597a1e68a152

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-cp38-abi3-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-cp38-abi3-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 cb25883d6f6c5c870bd325f0a774cf4d6a78232ed5576451d0f011969487a36a
MD5 80bf68449ad020a5b3fe0d564d842d11
BLAKE2b-256 8449278e0a702718e7ef3195f8f70cb7aef52a3755f049816319a84ab3cb047e

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a063218b5270d598f0d5a3d8d30b2759653aa623fa58cf54633dca7b9ed0b309
MD5 2e72a2dbe15d12aed0fe0e6c6bc53290
BLAKE2b-256 035633bb224bb3058abef748ac2a955d92d20941f946e5b032f41abb0d253e1f

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 b11086ea7d8aa0d5ad913158d21849c3a69744bfce38ce5787568a6e50319bb4
MD5 36d6998b5197f8e202b1b25b1a13824b
BLAKE2b-256 3a5414d94f1c0c7f3d87ad5b9c152babd3a420bac7aced300d6553859de97fc5

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.2-cp38-abi3-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.2-cp38-abi3-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 9378b980de6f27bfd5e304af9233fb92c56b3406704cbe27d0bbacd4cfffa6e2
MD5 17e98298944fe05d4a026409548e40e9
BLAKE2b-256 484983cfa1b7cd0f9cbee43dc57259db337ba3d5144559809712e59e60ad77d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page