Skip to main content

High-performance parser and generator for PostgreSQL-compatible tab-separated values (TSV)

Project description

Parse and generate tab-separated values (TSV) data

Tab-separated values (TSV) is a simple and popular format for data storage, data transfer, exporting data from and importing data to relational databases. For example, PostgreSQL COPY moves data between PostgreSQL tables and standard file-system files or in-memory stores, and its text format (a text file with one line per table row) is a generic version of TSV. Meanwhile, packages like asyncpg help efficiently insert, update or query data in bulk with binary data transfer between Python and PostgreSQL.

This package offers a high-performance alternative to convert data between a TSV text file and Python objects. The parser can read a TSV record into a Python tuple consisting of built-in Python types, one for each field. The generator can produce a TSV record from a tuple.

Installation

Even though tsv2py contains native code, the package is already pre-built for several target architectures. In most cases, you can install directly from a binary wheel, selected automatically by pip:

python3 -m pip install tsv2py

If a binary wheel is not available for the target platform, pip will attempt to install tsv2py from the source distribution. This will build the package on the fly as part of the installation process, which requires a C compiler such as gcc or clang. The following commands install a C compiler and the Python development headers on AWS Linux:

sudo yum groupinstall -y "Development Tools"
sudo yum install -y python3-devel python3-pip

If you lack a C compiler or the Python development headers, you will get error messages similar to the following:

error: command 'gcc' failed: No such file or directory
lib/tsv_parser.c:2:10: fatal error: Python.h: No such file or directory

Quick start

from tsv.helper import Parser

# specify the column structure
parser = Parser(fields=(bytes, date, datetime, float, int, str, UUID, bool))

# read and parse an entire file
with open(tsv_path, "rb") as f:
    py_records = parser.parse_file(f)

# read and parse a file line by line
with open(tsv_path, "rb") as f:
    for line in f:
        py_record = parser.parse_line(line)

TSV format

Text format is a simple tabular format in which each record (table row) occupies a single line.

  • Output always begins with a header row, which lists data field names.
  • Fields (table columns) are delimited by tab characters.
  • Non-printable characters and special values are escaped with backslash (\), as shown below:
Escape Interpretation
\N NULL value
\0 NUL character (ASCII 0)
\b Backspace (ASCII 8)
\f Form feed (ASCII 12)
\n Newline (ASCII 10)
\r Carriage return (ASCII 13)
\t Tab (ASCII 9)
\v Vertical tab (ASCII 11)
\\ Backslash (single character)

This format allows data to be easily imported into a database engine, e.g. with PostgreSQL COPY.

Output in this format is transmitted as media type text/plain or text/tab-separated-values in UTF-8 encoding.

Parser

The parser understands the following Python types:

  • None. This special value is returned for the TSV escape sequence \N.
  • bool. A literal true or false is converted into a boolean value.
  • bytes. TSV escape sequences are reversed before the data is passed to Python as a bytes object. NUL bytes are permitted.
  • datetime. The input has to comply with RFC 3339 and ISO 8601. The timezone must be UTC (a.k.a. suffix Z).
  • date. The input has to conform to the format YYYY-MM-DD.
  • time. The input has to conform to the format hh:mm:ssZ with no fractional seconds, or hh:mm:ss.ffffffZ with fractional seconds. Fractional seconds allow up to 6 digits of precision.
  • float. Interpreted as double precision floating point numbers.
  • int. Arbitrary-length integers are allowed.
  • str. TSV escape sequences are reversed before the data is passed to Python as a str. NUL bytes are not allowed.
  • uuid.UUID. The input has to comply with RFC 4122, or be a string of 32 hexadecimal digits.
  • decimal.Decimal. Interpreted as arbitrary precision decimal numbers.
  • ipaddress.IPv4Address.
  • ipaddress.IPv6Address.
  • list and dict, which are understood as JSON, and invoke the equivalent of json.loads to parse a serialized JSON string.

The backslash character \ is both a TSV and a JSON escape sequence initiator. When JSON data is written to TSV, several backslash characters may be needed, e.g. \\n in a quoted JSON string translates to a single newline character. First, \\ in \\n is understood as an escape sequence by the TSV parser to produce a single \ character followed by an n character, and in turn \n is understood as a single newline embedded in a JSON string by the JSON parser. Specifically, you need four consecutive backslash characters in TSV to represent a single backslash in a JSON quoted string.

Internally, the implementation uses AVX2 instructions to

  • parse RFC 3339 date-time strings into Python datetime objects,
  • parse RFC 4122 UUID strings or 32-digit hexadecimal strings into Python UUID objects,
  • and find \t delimiters between fields in a line.

For parsing integers up to the range of the long type, the parser calls the C standard library function strtol.

For parsing IPv4 and IPv6 addresses, the parser calls the C function inet_pton in libc or Windows Sockets (WinSock2).

If installed, the parser employs orjson to improve parsing speed of nested JSON structures. If not available, the library falls back to the built-in JSON decoder.

Date-time format

YYYY-MM-DDThh:mm:ssZ
YYYY-MM-DDThh:mm:ss.fZ
YYYY-MM-DDThh:mm:ss.ffZ
YYYY-MM-DDThh:mm:ss.fffZ
YYYY-MM-DDThh:mm:ss.ffffZ
YYYY-MM-DDThh:mm:ss.fffffZ
YYYY-MM-DDThh:mm:ss.ffffffZ

Date format

YYYY-MM-DD

Time format

hh:mm:ssZ
hh:mm:ss.fZ
hh:mm:ss.ffZ
hh:mm:ss.fffZ
hh:mm:ss.ffffZ
hh:mm:ss.fffffZ
hh:mm:ss.ffffffZ

Performance

Depending on the field types, tsv2py is up to 7 times faster to parse TSV records than a functionally equivalent Python implementation based on the Python standard library. Savings in execution time are more substantial for dates, UUIDs and longer strings with special characters (up to 90% savings), and they are more moderate for simple types like small integers (approx. 60% savings).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsv2py-0.8.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tsv2py-0.8.0-pp311-pypy311_pp73-win_amd64.whl (34.0 kB view details)

Uploaded PyPyWindows x86-64

tsv2py-0.8.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

tsv2py-0.8.0-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (23.1 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

tsv2py-0.8.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl (18.5 kB view details)

Uploaded PyPymacOS 11.0+ ARM64

tsv2py-0.8.0-pp310-pypy310_pp73-win_amd64.whl (34.0 kB view details)

Uploaded PyPyWindows x86-64

tsv2py-0.8.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.7 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

tsv2py-0.8.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (23.1 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

tsv2py-0.8.0-pp310-pypy310_pp73-macosx_11_0_arm64.whl (18.5 kB view details)

Uploaded PyPymacOS 11.0+ ARM64

tsv2py-0.8.0-cp310-abi3-win_amd64.whl (33.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

tsv2py-0.8.0-cp310-abi3-win32.whl (31.9 kB view details)

Uploaded CPython 3.10+Windows x86

tsv2py-0.8.0-cp310-abi3-musllinux_1_2_x86_64.whl (77.7 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

tsv2py-0.8.0-cp310-abi3-musllinux_1_2_i686.whl (41.4 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ i686

tsv2py-0.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (74.5 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

tsv2py-0.8.0-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (39.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

tsv2py-0.8.0-cp310-abi3-macosx_11_0_arm64.whl (18.5 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file tsv2py-0.8.0.tar.gz.

File metadata

  • Download URL: tsv2py-0.8.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for tsv2py-0.8.0.tar.gz
Algorithm Hash digest
SHA256 201f7c046391264a8d1fdf798902d8e41fe49541500f31301a34d349772eb4e5
MD5 9d15e73e14ed6213d1730064d1a0b3e4
BLAKE2b-256 fe6936024d992fc5d064350848ef0671b57ee3af33322787a7b1ac8f27b2c84f

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp311-pypy311_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp311-pypy311_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 ea5b2be5b5a22922031e444f3394c992145419a551f6d987ed952bcc4c5a1f2e
MD5 33cfb4dabb4ae072b374d1076e4fc4bc
BLAKE2b-256 f88563abe46c130e9db509bdc0e32f631c31072141ed93e74832d62c3bcb1174

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ba772d4ed196aecfcb6d49873017b6896f7d559b6f7da73430ec2d2be303beb1
MD5 d0aa51da4a3bcfc54e99c03a7a8566f1
BLAKE2b-256 96f2233c8b98af7bdce545fe059c13ad6e0609fe19a50649fa573842bad2a15f

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 8870eab5089474b6d0ea9a678e12ba234505a3a9dd556701eb3d25371ac53926
MD5 679482eb836110ca69eb7ba927e9ac13
BLAKE2b-256 d71f26e1d5664949c7619f4da1b8a3a54ef31da217a48212767ce86860008715

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fd70acc63c30fd939103a9a156d8ce5fd63e4f1e68ae48d509db0aa8c81b75fb
MD5 eda223544cd3491786b2fdc36c158083
BLAKE2b-256 a9d54ad6fa5dd9a8d2ba0055c1dab2d4f8c8adddc15a2c2ac2d595068aea077c

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp310-pypy310_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp310-pypy310_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 a1f883e8e6dcb34e9f3c702ab1ddd5477825ce6692b879296ac58054095d1814
MD5 81069dac9207025dd52c33ed3cd5679c
BLAKE2b-256 ae05bfa46b0825d6869fe9880bd969847cbe4a4a3a171fd9eca66923fbde899c

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 587708a07973029196f196c94b25cc0d970da3da53bbabcfded1bbfa10dea76f
MD5 bbb4ca5f21fd10270c1cb70bba3c7461
BLAKE2b-256 6cc76379456f3a9b1c600954a19b64b2702fcdb26d5c891213d8a95c31a54a15

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 38fa11fc31235f85f6a9aa220cb5b07ccb2faf1773b571e8d4b5e83747ace716
MD5 428c201ea1f5e89d2825dc15f310550c
BLAKE2b-256 c4ca59fd1278f712439be0e940f7be69069b39ad08ea5d33d9c59f094a757003

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-pp310-pypy310_pp73-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-pp310-pypy310_pp73-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 73faa731c9a4ea986430402a127d557ce2a1bba0a4a73b0d80b992380d87ee1b
MD5 34c23b27e7ff898b69f884d1a1acd5e1
BLAKE2b-256 808822ccee0b1b566829a1e548d538059e94ccbef3a2c58a3cb02973fed0880d

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: tsv2py-0.8.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 41b41cb85f1a91f5412f3ccf775a1e6f433d1d0812dcabf308e1cca8ee6d8f87
MD5 91349ad442224c64f0a7e703c4064ce0
BLAKE2b-256 6632e814b31024540335a2806086b83499e28526bc3d2f7d4b5e73120e9e618c

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-win32.whl.

File metadata

  • Download URL: tsv2py-0.8.0-cp310-abi3-win32.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: CPython 3.10+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 96972a491de507d885230e8651a681d41401be62f22df919111ff7291ff4b7db
MD5 4d66faf9ed898b075c7cd8536a68127a
BLAKE2b-256 792b2341fe1a19b608de078ef556650481f3f0b13df2b771a07685295bff31da

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 758f8483add23116c6661f9a6e225f094ed092133c8e596a65ae20fa6efd7c88
MD5 4ddb539370a8ea13627337d5582475fd
BLAKE2b-256 fd0ff2a442d7f0553275250f8945fa78c6fdf8893b1e6ed4e17b3cd2654e7820

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 ca0a6a0bb37ebc4411b24ddef97d5e4ee001da9c2f893e748bd3bebc35d7b52e
MD5 16ca692a198d588c803f89974be108e9
BLAKE2b-256 89c439317681a64b7e6d1bd648ad211fb902d61457e06e36c55e171eb7bd38b5

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9807080573f176f53a4593bbe864385fe3e36f438610a12587db5ad0e2ac9142
MD5 1ab57f680ed991feca5101d2afa7f526
BLAKE2b-256 60d7debd48a285234045c039b3614d6bda0c17eefb1c0160a58c8dae97a2b8bb

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 2ae85c978946ee3236c9f64cfa6cd21b3b8235f4f149943b273667b4499d2ca1
MD5 6f5c9f1f3b24b0ce511daa2c3ffb2e41
BLAKE2b-256 aa98a13f3f734c74b7275ee3f3f3983feedc2497339ba61963bb94b643696ed3

See more details on using hashes here.

File details

Details for the file tsv2py-0.8.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tsv2py-0.8.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fc086212f3841171695cb78101ef7723046dd7af4a8d15641ab891ba8cc63586
MD5 8d14986cf07d1e22ad08c4fef3e597c4
BLAKE2b-256 be0366094f69e5e9de85d9132db2dad6d9fa6bf49cdf2d0faef6d0bc8bad8a6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page