Skip to main content

High-performance parser and generator for PostgreSQL-compatible tab-separated values (TSV)

Project description

Parse and generate tab-separated values (TSV) data

Tab-separated values (TSV) is a simple and popular format for data storage, data transfer, exporting data from and importing data to relational databases. For example, PostgreSQL COPY moves data between PostgreSQL tables and standard file-system files or in-memory stores, and its text format (a text file with one line per table row) is a generic version of TSV. Meanwhile, packages like asyncpg help efficiently insert, update or query data in bulk with binary data transfer between Python and PostgreSQL.

This package offers a high-performance alternative to convert data between a TSV text file and Python objects. The parser can read a TSV record into a Python tuple consisting of built-in Python types, one for each field. The generator can produce a TSV record from a tuple.

Installation

Even though tsv2py contains native code, the package is already pre-built for several target architectures. In most cases, you can install directly from a binary wheel, selected automatically by pip:

python3 -m pip install tsv2py

If a binary wheel is not available for the target platform, pip will attempt to install tsv2py from the source distribution. This will build the package on the fly as part of the installation process, which requires a C compiler such as gcc or clang. The following commands install a C compiler and the Python development headers on AWS Linux:

sudo yum groupinstall -y "Development Tools"
sudo yum install -y python3-devel python3-pip

If you lack a C compiler or the Python development headers, you will get error messages similar to the following:

error: command 'gcc' failed: No such file or directory
lib/tsv_parser.c:2:10: fatal error: Python.h: No such file or directory

Quick start

from tsv.helper import Parser

# specify the column structure
parser = Parser(fields=(bytes, date, datetime, float, int, str, UUID, bool))

# read and parse an entire file
with open(tsv_path, "rb") as f:
    py_records = parser.parse_file(f)

# read and parse a file line by line
with open(tsv_path, "rb") as f:
    for line in f:
        py_record = parser.parse_line(line)

TSV format

Text format is a simple tabular format in which each record (table row) occupies a single line.

  • Output always begins with a header row, which lists data field names.
  • Fields (table columns) are delimited by tab characters.
  • Non-printable characters and special values are escaped with backslash (\), as shown below:
Escape Interpretation
\N NULL value
\0 NUL character (ASCII 0)
\b Backspace (ASCII 8)
\f Form feed (ASCII 12)
\n Newline (ASCII 10)
\r Carriage return (ASCII 13)
\t Tab (ASCII 9)
\v Vertical tab (ASCII 11)
\\ Backslash (single character)

This format allows data to be easily imported into a database engine, e.g. with PostgreSQL COPY.

Output in this format is transmitted as media type text/plain or text/tab-separated-values in UTF-8 encoding.

Parser

The parser understands the following Python types:

  • None. This special value is returned for the TSV escape sequence \N.
  • bool. A literal true or false is converted into a boolean value.
  • bytes. TSV escape sequences are reversed before the data is passed to Python as a bytes object. NUL bytes are permitted.
  • datetime. The input has to comply with RFC 3339 and ISO 8601. The timezone must be UTC (a.k.a. suffix Z).
  • date. The input has to conform to the format YYYY-MM-DD.
  • time. The input has to conform to the format hh:mm:ssZ with no fractional seconds, or hh:mm:ss.ffffffZ with fractional seconds. Fractional seconds allow up to 6 digits of precision.
  • float. Interpreted as double precision floating point numbers.
  • int. Arbitrary-length integers are allowed.
  • str. TSV escape sequences are reversed before the data is passed to Python as a str. NUL bytes are not allowed.
  • uuid.UUID. The input has to comply with RFC 4122, or be a string of 32 hexadecimal digits.
  • decimal.Decimal. Interpreted as arbitrary precision decimal numbers.
  • ipaddress.IPv4Address.
  • ipaddress.IPv6Address.
  • list and dict, which are understood as JSON, and invoke the equivalent of json.loads to parse a serialized JSON string.

The backslash character \ is both a TSV and a JSON escape sequence initiator. When JSON data is written to TSV, several backslash characters may be needed, e.g. \\n in a quoted JSON string translates to a single newline character. First, \\ in \\n is understood as an escape sequence by the TSV parser to produce a single \ character followed by an n character, and in turn \n is understood as a single newline embedded in a JSON string by the JSON parser. Specifically, you need four consecutive backslash characters in TSV to represent a single backslash in a JSON quoted string.

Internally, the implementation uses AVX2 instructions to

  • parse RFC 3339 date-time strings into Python datetime objects,
  • parse RFC 4122 UUID strings or 32-digit hexadecimal strings into Python UUID objects,
  • and find \t delimiters between fields in a line.

For parsing integers up to the range of the long type, the parser calls the C standard library function strtol.

For parsing IPv4 and IPv6 addresses, the parser calls the C function inet_pton in libc or Windows Sockets (WinSock2).

If installed, the parser employs orjson to improve parsing speed of nested JSON structures. If not available, the library falls back to the built-in JSON decoder.

Date-time format

YYYY-MM-DDThh:mm:ssZ
YYYY-MM-DDThh:mm:ss.fZ
YYYY-MM-DDThh:mm:ss.ffZ
YYYY-MM-DDThh:mm:ss.fffZ
YYYY-MM-DDThh:mm:ss.ffffZ
YYYY-MM-DDThh:mm:ss.fffffZ
YYYY-MM-DDThh:mm:ss.ffffffZ

Date format

YYYY-MM-DD

Time format

hh:mm:ssZ
hh:mm:ss.fZ
hh:mm:ss.ffZ
hh:mm:ss.fffZ
hh:mm:ss.ffffZ
hh:mm:ss.fffffZ
hh:mm:ss.ffffffZ

Performance

Depending on the field types, tsv2py is up to 7 times faster to parse TSV records than a functionally equivalent Python implementation based on the Python standard library. Savings in execution time are more substantial for dates, UUIDs and longer strings with special characters (up to 90% savings), and they are more moderate for simple types like small integers (approx. 60% savings).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsv2py-0.7.0.tar.gz (26.3 kB view details)

Uploaded Source

Built Distributions

tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.5 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.6 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (22.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.4 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.6 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (22.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.4 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.6 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (22.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.7.0-cp38-abi3-win_amd64.whl (35.8 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tsv2py-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl (77.0 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

tsv2py-0.7.0-cp38-abi3-musllinux_1_2_i686.whl (40.5 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ i686

tsv2py-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl (38.9 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

tsv2py-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (74.1 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tsv2py-0.7.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (40.3 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tsv2py-0.7.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (38.7 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.7.0-cp38-abi3-macosx_10_9_universal2.whl (42.5 kB view details)

Uploaded CPython 3.8+ macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file tsv2py-0.7.0.tar.gz.

File metadata

  • Download URL: tsv2py-0.7.0.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for tsv2py-0.7.0.tar.gz
Algorithm Hash digest
SHA256 af82767fcee00ba617eb0bc6eae00e3673efc80b93e93693014708e1100e8fb0
MD5 ae41bbb706515ce628b08e81ad078390
BLAKE2b-256 d5c78e9e6707c811a4972f9201ccd2ba5cc27bc4945ab12ca2fa8b89741c38c9

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1e7d230b51026cac89259c589eb3988ea35a434cfbd28e93df3875f120522f8f
MD5 9f94f157d60a9d775ffef2d4faf330c4
BLAKE2b-256 8913c94d043bfa66c895f4eaa6415e65c8b94e64b2cf0c491d4efd88efd72a96

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 912ee2ce059c9d94f1f63634cb149266ed7c0301cfb4d3bcbf279ff84aaed2c5
MD5 582069146e05bd9368602f1dba3b07a8
BLAKE2b-256 7eb22f7f4d26ceda0598a3f81b6d42ab9045bcd3484125173922646d6bd564a1

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 842f7faaafc21180fa88990ff24a847cbd9b280f9f8709cd93291db45ae88326
MD5 2f3050b03e35e40817033d773876bdd3
BLAKE2b-256 d4a9177e96ca20f3176763fa093ada40d1707d409449268fa79d34c8839c1acb

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 30453345ec7ffe1c8d3e5b1ce137e4283129a26435b367c262bcb10c84e97183
MD5 84f92bf0311d06d8dcc716453879727a
BLAKE2b-256 644f92ab58c2d2091da302e0a060f4daffcd40b748ebcedf1f87bda9a4ceba93

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 491fd2b178ba48a4de2b9abb2821772696e152f4bc5697e1abff03424f87cbed
MD5 860e9458232346d160279e1025ba3d8d
BLAKE2b-256 3f7218a57faad45caed2180d6bbbd33f033116a4a4be7ded2aa1e821d5433f65

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 0a1eb597305bdf5303befdec72caf8f6163ed4659ea1f6fbbfbf712f05bc904b
MD5 79a7f4e9763fdf0ccbda4c5d0d076bdc
BLAKE2b-256 e83824cc85757453ba2c32fb6ddafed94ae25cb9bfca58167b173d9bbef9fa6b

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e93dbe0d1151cbacad780ac61250705ba1543f6a99a7244e6a4f4dbe06c3450a
MD5 547867824bc1d3403525b4940e1ec8a3
BLAKE2b-256 70cb544bcb7979f6a99fa8d38928752186fa9317c09b4aeedd982539814ad463

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 75e6c60ef8a556c3bb6ce386723486e0f24bc285681f24d0b3769d95f5b3c39e
MD5 60bb9edf270760aca9391c40f7c763e4
BLAKE2b-256 7c8c21ccce66fc7715e539034300c5fc13e111aa8b38284e1292d178c2bd2104

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 252f30e33ff4241df7738e5a2ae877d7d0f13ef7a1f1627ce19d6baabedd32a7
MD5 2ce1840bbbfcfe652cccfbdf66eebb49
BLAKE2b-256 b1e16a9d4b9d8ebcba680084033d4ace0d08053eacf2a6ee7ab983cc984d78bb

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: tsv2py-0.7.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.2

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5e90cc68071adfa18c6dc764558b9ab79db516582d1f075f2e2f791f462cdabe
MD5 2892183bd317bbde0dd8f41d0fcdce42
BLAKE2b-256 2135196c6f63b490946234efb3c7eb17a210572b0bf0c67f216e974936246c40

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4b18bac120d2f20b96f5b29f5e708d4d22ab126fd4b497a8bad90b8f1a34ef9f
MD5 285465265fccf41d8b831285466dec70
BLAKE2b-256 2f240d8502f319f3f310a99ca888cd44c7530684996158327cc28871fcfac523

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 7d194cc3d7ae883f8210a3c7d894da1167f67c2c3b7da6fb469f8ed13f92cafd
MD5 92888335f4e3a848c7b02704614e781a
BLAKE2b-256 cd84d44ecdbb2fd131689613a93e35b9d790a015432fe393fbb12710ac1dc08a

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ff9e086119f513b03e746193a7a765e0c98cefe4db0717bbd433a1637330ef0d
MD5 8e200f3656527403e9c7a8da36678499
BLAKE2b-256 8d98eac9beffaa5b943568ac9a7446ed5ff25fb5e736acf034b2920fc32641eb

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 312c959161d2d2106c1126eb9425a419aff9e79a26dbae0ec220fa03545dc850
MD5 e704f6aec2f9e189e2125a5bc6b5cb5d
BLAKE2b-256 43f5948fde578970ba87f7dde7367e85049ffb50515f35539cb715b270bba427

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b146cdc4e07170b6d0e12ef2740a1c6de67073e3f69815ba834c241cbdeb01cb
MD5 eb287692d0f9eddebcbeccc8f9a57d8b
BLAKE2b-256 98ec0b5de435ec645e87b8239cdd8b056af3f3c6135e130c4737b35540c72737

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 5293a2b3702b51318c2356e92bbc4c861f50545f6714a52e229399b6b651ed65
MD5 31f21efa2777c45275c14ce6dbe93f9f
BLAKE2b-256 302ebf6b0ef653cd59622eca2cfae5e12fb6b5840093cbc648522571e7821d6f

See more details on using hashes here.

File details

Details for the file tsv2py-0.7.0-cp38-abi3-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for tsv2py-0.7.0-cp38-abi3-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 9cf39963ff7c6ae1581635637ab6cab8f6f30b772d388be534e715db25f12fe2
MD5 62c6b39be5a11134acc1da0cee6ea5a8
BLAKE2b-256 9c6ccd580849935d55cf5bb04227f46ea4cc4e65a84c13587fe0811cbe4769dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page