Skip to main content

High-performance parser and generator for PostgreSQL-compatible tab-separated values (TSV)

Project description

Parse and generate tab-separated values (TSV) data

Tab-separated values (TSV) is a simple and popular format for data storage, data transfer, exporting data from and importing data to relational databases. For example, PostgreSQL COPY moves data between PostgreSQL tables and standard file-system files or in-memory stores, and its text format (a text file with one line per table row) is a generic version of TSV. Meanwhile, packages like asyncpg help efficiently insert, update or query data in bulk with binary data transfer between Python and PostgreSQL.

This package offers a high-performance alternative to convert data between a TSV text file and Python objects. The parser can read a TSV record into a Python tuple consisting of built-in Python types, one for each field. The generator can produce a TSV record from a tuple.

Quick start

from tsv.helper import Parser

# specify the column structure
parser = Parser(fields=(bytes, date, datetime, float, int, str, UUID, bool))

# read and parse an entire file
with open(tsv_path, "rb") as f:
    py_records = parser.parse_file(f)

# read and parse a file line by line
with open(tsv_path, "rb") as f:
    for line in f:
        py_record = parser.parse_line(line)

TSV format

Text format is a simple tabular format in which each record (table row) occupies a single line.

  • Output always begins with a header row, which lists data field names.
  • Fields (table columns) are delimited by tab characters.
  • Non-printable characters and special values are escaped with backslash (\), as shown below:
Escape Interpretation
\N NULL value
\0 NUL character (ASCII 0)
\b Backspace (ASCII 8)
\f Form feed (ASCII 12)
\n Newline (ASCII 10)
\r Carriage return (ASCII 13)
\t Tab (ASCII 9)
\v Vertical tab (ASCII 11)
\\ Backslash (single character)

This format allows data to be easily imported into a database engine, e.g. with PostgreSQL COPY.

Output in this format is transmitted as media type text/plain or text/tab-separated-values in UTF-8 encoding.

Parser

The parser understands the following Python types:

  • None. This special value is returned for the TSV escape sequence \N.
  • bool. A literal true or false is converted into a boolean value.
  • bytes. TSV escape sequences are reversed before the data is passed to Python as a bytes object. NUL bytes are permitted.
  • datetime. The input has to comply with RFC 3339 and ISO 8601. The timezone must be UTC (a.k.a. suffix Z).
  • date. The input has to conform to the format YYYY-MM-DD.
  • time. The input has to conform to the format hh:mm:ssZ with no fractional seconds, or hh:mm:ss.ffffffZ with fractional seconds. Fractional seconds allow up to 6 digits of precision.
  • float. Interpreted as double precision floating point numbers.
  • int. Arbitrary-length integers are allowed.
  • str. TSV escape sequences are reversed before the data is passed to Python as a str. NUL bytes are not allowed.
  • uuid.UUID. The input has to comply with RFC 4122, or be a string of 32 hexadecimal digits.
  • decimal.Decimal. Interpreted as arbitrary precision decimal numbers.
  • ipaddress.IPv4Address.
  • ipaddress.IPv6Address.
  • list and dict, which are understood as JSON, and invoke the equivalent of json.loads to parse a serialized JSON string.

The backslash character \ is both a TSV and a JSON escape sequence initiator. When JSON data is written to TSV, several backslash characters may be needed, e.g. \\n in a quoted JSON string translates to a single newline character. First, \\ in \\n is understood as an escape sequence by the TSV parser to produce a single \ character followed by an n character, and in turn \n is understood as a single newline embedded in a JSON string by the JSON parser. Specifically, you need four consecutive backslash characters in TSV to represent a single backslash in a JSON quoted string.

Internally, the implementation uses AVX2 instructions to

  • parse RFC 3339 date-time strings into Python datetime objects,
  • parse RFC 4122 UUID strings or 32-digit hexadecimal strings into Python UUID objects,
  • and find \t delimiters between fields in a line.

For parsing integers up to the range of the long type, the parser calls the C standard library function strtol.

For parsing IPv4 and IPv6 addresses, the parser calls the C function inet_pton in libc or Windows Sockets (WinSock2).

If installed, the parser employs orjson to improve parsing speed of nested JSON structures. If not available, the library falls back to the built-in JSON decoder.

Date-time format

YYYY-MM-DDThh:mm:ssZ
YYYY-MM-DDThh:mm:ss.fZ
YYYY-MM-DDThh:mm:ss.ffZ
YYYY-MM-DDThh:mm:ss.fffZ
YYYY-MM-DDThh:mm:ss.ffffZ
YYYY-MM-DDThh:mm:ss.fffffZ
YYYY-MM-DDThh:mm:ss.ffffffZ

Date format

YYYY-MM-DD

Time format

hh:mm:ssZ
hh:mm:ss.fZ
hh:mm:ss.ffZ
hh:mm:ss.fffZ
hh:mm:ss.ffffZ
hh:mm:ss.fffffZ
hh:mm:ss.ffffffZ

Performance

Depending on the field types, tsv2py is up to 7 times faster to parse TSV records than a functionally equivalent Python implementation based on the Python standard library. Savings in execution time are more substantial for dates, UUIDs and longer strings with special characters (up to 90% savings), and they are more moderate for simple types like small integers (approx. 60% savings).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsv2py-0.6.3.tar.gz (24.4 kB view details)

Uploaded Source

Built Distributions

tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.1 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (18.2 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (17.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.1 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (18.2 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (17.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.1 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (18.2 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (17.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.3-cp38-abi3-win_amd64.whl (19.3 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tsv2py-0.6.3-cp38-abi3-musllinux_1_2_aarch64.whl (33.6 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

tsv2py-0.6.3-cp38-abi3-musllinux_1_1_x86_64.whl (43.5 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.1+ x86-64

tsv2py-0.6.3-cp38-abi3-musllinux_1_1_i686.whl (34.4 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.1+ i686

tsv2py-0.6.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.8 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tsv2py-0.6.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (35.2 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tsv2py-0.6.3-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (31.9 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tsv2py-0.6.3-cp38-abi3-macosx_10_9_universal2.whl (24.3 kB view details)

Uploaded CPython 3.8+ macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file tsv2py-0.6.3.tar.gz.

File metadata

  • Download URL: tsv2py-0.6.3.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.6

File hashes

Hashes for tsv2py-0.6.3.tar.gz
Algorithm Hash digest
SHA256 e75c806af08973641f9102035ecc4f4ff60b4981029c8c9e80b1e2bfc9d690b9
MD5 eb54fc2f2ccb35748e6c86c3d19f5ecd
BLAKE2b-256 c917cb672535c8ea72188459052a7252f0e7fc3865be7f27bf887256677ebe18

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f902dc43b120bf555240b89c67ececd9f1db7020c2a510a46a46fc260d78d05a
MD5 d1d3fb1b6389a7c07ca5c29572507ed3
BLAKE2b-256 c24ac07ddcb2a6fd7c25f4ac85ba7d512d602450fbe79593717ca501e47b383a

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 56088afea4031efccaa2b281542b25a7eee9abaf4a8b475b3664fa0eba1b7bf2
MD5 0782ad299ea5fc338222108d64b989a7
BLAKE2b-256 d17365ce3a445f0b8a7a18a21ded518244a708e606d0903460cfdf7b1247ed96

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 c7222199c48501617d2e5db84334120b8ec33a552ee902a77951306d91fdbce1
MD5 a238ddbd7925fe2317db724bb2cf3490
BLAKE2b-256 f8fe0934d2b1e4a280a0cc92c59a76d53accc36d132b7713bad64c53db0227c4

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e6c01858286332447a46239551788e2ed7de2c6d9cb647eaf68786c199f9718d
MD5 8f3fc842e4858fbded756501cca42a17
BLAKE2b-256 c65b27bcde1cb1084527eca2ff6d1eb1782dff4822307b47ac21a906309844ff

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 041716020d867e33cce6e8cc9f4740e6bdfd06231290f40e2068bbf872d52c6c
MD5 4f687cffdb87126003cab355006a6563
BLAKE2b-256 74597d84cd493c7eb037ec486bad1e20eb11ad27c07741c17aec914bb52d7c72

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 c60205d2674ffe5b92116c653d3db584114972cdff874a8b1746b3b4b73bdd1b
MD5 8389bd8835d97cfec12408f694216b01
BLAKE2b-256 b797547e1ed7f21c37d25ea11e7610f5a74ce43a6622f298dc2082b6d328b5fb

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d149a69fd11dd5f34a0368f262f29f33b851cd5c485dfa9f7bdd4914ce02a1ab
MD5 7a5ca48c0464062cecacb3dfb30becff
BLAKE2b-256 c419b7ed4465b02a48e8333bcc36276b3ae060daf1e2b413071ea8cae9a8e742

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 65f105bd8eceedb010758e13c13147a69add261b2c8fda71d92e1cead4fb2ae8
MD5 960655c1647c22a3fef29fc4491743d2
BLAKE2b-256 a50979e27d780e07fc3e78fc751d0ce84e478f294237e96f094208aefe2511be

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 07e8885a575e786396cc846354f1d0236853a7148bb247813100abe9dacd64d1
MD5 0602384bd1cca9a029fd6aeba65b6ab9
BLAKE2b-256 55fb207daa53602d7fdfe859b0e68f49c260dfbf9f787b7b0e87ef5113aeb3df

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: tsv2py-0.6.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.2

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c92cb4a5bf8566962823c58e81282ef472a546d5c5bd0894281f5fb88744a2fe
MD5 ec995fe50d183a7c7f8a8cc06507f4b3
BLAKE2b-256 7c55a774b2dc0a79a29a9cb11838b3fed257dd3a69fd2855101cf87eca5c5268

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 0708c7cbf1cce8e3299462da3788d14ba0877c19721fe14a0cbf30a845982955
MD5 d1995e0012a4fc75fe0b69017b895eb6
BLAKE2b-256 40eb14671d1bdc87a5a8ca66939bdbc7242d2cc61b68a5760af2b2f9812aa7a6

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 8e16ab5a648a7b4574c9e99d314538b8b32423dec77d17bd4ea76d68232d4557
MD5 b49c73b2e8a6413a8e99aa6fb3996e66
BLAKE2b-256 acd7de735a1558aa11f12143be803e51a0137a9a584dba8ce1bfe31acaf2f815

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 f738305996c98eb8e828dc92568cef9ddd5db9cef56ba519fb39ab5a4ca3ffc9
MD5 62ed517e155437ac117132c0ae6248bb
BLAKE2b-256 dbb39dfbc8512e9e687e4eca08033f00987c933d940c8b326bb09b82b7e5249c

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3733619f7956c0cb6a7c6e0732e3df806ef71ec5c9c95cfeeea16d979644be35
MD5 763f1b1deaac636144026a8ae0c8c359
BLAKE2b-256 c11cddd0892c5ba5aed00b6d1666aedff536754275c548f88761e84ac478760c

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ab732adff2ec0548393363b41025490fc599ead7ed3482cbc29280e709f4f84e
MD5 871b832d76b960dc112669fe7be36fae
BLAKE2b-256 7af48ec6f977b991c6da6e733eb0936612700ac8070a5747bfe28a08eea183f3

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 ac4a59b11111c264cd8acd6ebcd63b5f2d52bec8166efd377bddce0bea39911a
MD5 81fc9ef5ddda7b47e6258d7b07f10e7c
BLAKE2b-256 fde3b322f81adee8001fa765a250082e867023e6db9dd59352dba11e03ec2ce5

See more details on using hashes here.

File details

Details for the file tsv2py-0.6.3-cp38-abi3-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for tsv2py-0.6.3-cp38-abi3-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 4f34a81e17d9e760d009d314b3ffb88198ea128f4e6c78ce2214fe088311e1a5
MD5 d6a77aad44c8f5f83d46436bae52599d
BLAKE2b-256 9f11049e0668cc2671657035d8758ead6de92b843a1f5ddb9c6855954e527985

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page