High-performance parser and generator for PostgreSQL-compatible tab-separated values (TSV)
Project description
Parse and generate tab-separated values (TSV) data
Tab-separated values (TSV) is a simple and popular format for data storage, data transfer, exporting data from and importing data to relational databases. For example, PostgreSQL COPY moves data between PostgreSQL tables and standard file-system files or in-memory stores, and its text
format (a text file with one line per table row) is a generic version of TSV. Meanwhile, packages like asyncpg help efficiently insert, update or query data in bulk with binary data transfer between Python and PostgreSQL.
This package offers a high-performance alternative to convert data between a TSV text file and Python objects. The parser can read a TSV record into a Python tuple consisting of built-in Python types, one for each field. The generator can produce a TSV record from a tuple.
Quick start
from tsv.helper import Parser
# specify the column structure
parser = Parser(fields=(bytes, date, datetime, float, int, str, UUID, bool))
# read and parse an entire file
with open(tsv_path, "rb") as f:
py_records = parser.parse_file(f)
# read and parse a file line by line
with open(tsv_path, "rb") as f:
for line in f:
py_record = parser.parse_line(line)
TSV format
Text format is a simple tabular format in which each record (table row) occupies a single line.
- Output always begins with a header row, which lists data field names.
- Fields (table columns) are delimited by tab characters.
- Non-printable characters and special values are escaped with backslash (
\
), as shown below:
Escape | Interpretation |
---|---|
\N |
NULL value |
\0 |
NUL character (ASCII 0) |
\b |
Backspace (ASCII 8) |
\f |
Form feed (ASCII 12) |
\n |
Newline (ASCII 10) |
\r |
Carriage return (ASCII 13) |
\t |
Tab (ASCII 9) |
\v |
Vertical tab (ASCII 11) |
\\ |
Backslash (single character) |
This format allows data to be easily imported into a database engine, e.g. with PostgreSQL COPY.
Output in this format is transmitted as media type text/plain
or text/tab-separated-values
in UTF-8 encoding.
Parser
The parser understands the following Python types:
None
. This special value is returned for the TSV escape sequence\N
.bool
. A literaltrue
orfalse
is converted into a boolean value.bytes
. TSV escape sequences are reversed before the data is passed to Python as abytes
object. NUL bytes are permitted.datetime
. The input has to comply with RFC 3339 and ISO 8601. The timezone must be UTC (a.k.a. suffixZ
).date
. The input has to conform to the formatYYYY-MM-DD
.time
. The input has to conform to the formathh:mm:ssZ
with no fractional seconds, orhh:mm:ss.ffffffZ
with fractional seconds. Fractional seconds allow up to 6 digits of precision.float
. Interpreted as double precision floating point numbers.int
. Arbitrary-length integers are allowed.str
. TSV escape sequences are reversed before the data is passed to Python as astr
. NUL bytes are not allowed.uuid.UUID
. The input has to comply with RFC 4122, or be a string of 32 hexadecimal digits.decimal.Decimal
. Interpreted as arbitrary precision decimal numbers.ipaddress.IPv4Address
.ipaddress.IPv6Address
.list
anddict
, which are understood as JSON, and invoke the equivalent ofjson.loads
to parse a serialized JSON string.
The backslash character \
is both a TSV and a JSON escape sequence initiator. When JSON data is written to TSV, several backslash characters may be needed, e.g. \\n
in a quoted JSON string translates to a single newline character. First, \\
in \\n
is understood as an escape sequence by the TSV parser to produce a single \
character followed by an n
character, and in turn \n
is understood as a single newline embedded in a JSON string by the JSON parser. Specifically, you need four consecutive backslash characters in TSV to represent a single backslash in a JSON quoted string.
Internally, the implementation uses AVX2 instructions to
- parse RFC 3339 date-time strings into Python
datetime
objects, - parse RFC 4122 UUID strings or 32-digit hexadecimal strings into Python
UUID
objects, - and find
\t
delimiters between fields in a line.
For parsing integers up to the range of the long
type, the parser calls the C standard library function strtol.
For parsing IPv4 and IPv6 addresses, the parser calls the C function inet_pton in libc or Windows Sockets (WinSock2).
If installed, the parser employs orjson to improve parsing speed of nested JSON structures. If not available, the library falls back to the built-in JSON decoder.
Date-time format
YYYY-MM-DDThh:mm:ssZ
YYYY-MM-DDThh:mm:ss.fZ
YYYY-MM-DDThh:mm:ss.ffZ
YYYY-MM-DDThh:mm:ss.fffZ
YYYY-MM-DDThh:mm:ss.ffffZ
YYYY-MM-DDThh:mm:ss.fffffZ
YYYY-MM-DDThh:mm:ss.ffffffZ
Date format
YYYY-MM-DD
Time format
hh:mm:ssZ
hh:mm:ss.fZ
hh:mm:ss.ffZ
hh:mm:ss.fffZ
hh:mm:ss.ffffZ
hh:mm:ss.fffffZ
hh:mm:ss.ffffffZ
Performance
Depending on the field types, tsv2py is up to 7 times faster to parse TSV records than a functionally equivalent Python implementation based on the Python standard library. Savings in execution time are more substantial for dates, UUIDs and longer strings with special characters (up to 90% savings), and they are more moderate for simple types like small integers (approx. 60% savings).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file tsv2py-0.6.0.tar.gz
.
File metadata
- Download URL: tsv2py-0.6.0.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 472eaddc9f32a242a980b475d2aa3ceb44417697aabbbcbb56258f36adfb64f4 |
|
MD5 | 16a03445e6a3b7d65ef110e90d47c38c |
|
BLAKE2b-256 | 3a221e3c568b52432f3d17409816c551811ca0f593622d5db2909223d42ebe17 |
File details
Details for the file tsv2py-0.6.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: tsv2py-0.6.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 17.4 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0db87ac17781c7522baaf925dc008e30df4930ff5ef7d8a7c8cbc70fa56308cf |
|
MD5 | b695d0c71b92c8232acede1f81279495 |
|
BLAKE2b-256 | 19828f2512fe50d0e4bda6dae75b57bc2aadf87959f06fcdbacaccb2e6541f97 |
File details
Details for the file tsv2py-0.6.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: tsv2py-0.6.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 17.7 kB
- Tags: PyPy, manylinux: glibc 2.17+ i686, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73649aa556301b67295b2e065318c2c68ac76837c9d6f905c3c4781dca2ada32 |
|
MD5 | cc68e3f0d1451c6b48a1cfdf6b5a671a |
|
BLAKE2b-256 | 087eb73efd82970325af804b89075aa7f00e438d7add50165e71431a7f615dfc |
File details
Details for the file tsv2py-0.6.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: tsv2py-0.6.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 17.4 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8a046f458cc35af9ece1d45a696baa07fe592e6e07dfcc4d67c7f74603c4eaf |
|
MD5 | 04dd55f5ab224d68814a889e869e3f1f |
|
BLAKE2b-256 | 2b886ca7ec348a9a21e05f3eb836d15ed71921cf7c4121b860bb08b293128afd |
File details
Details for the file tsv2py-0.6.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: tsv2py-0.6.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 17.7 kB
- Tags: PyPy, manylinux: glibc 2.17+ i686, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d95ffef52b6b161e28577ac90c98cb81755bc69fd290c48e3e2fbcdf7b2cc7c |
|
MD5 | 30a9aaa62d01c7b09880a6f0db6ab445 |
|
BLAKE2b-256 | de67c2fdf55aa5e306bb42d997acb1e0e253046f2af004ef37689bf88bb4dfc7 |
File details
Details for the file tsv2py-0.6.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: tsv2py-0.6.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 17.4 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7acbc418a206d1702793d0f21aba65b2d31f431e1f52128c0976535ec19491c2 |
|
MD5 | 557d24685647249af2a4ca470e251c60 |
|
BLAKE2b-256 | 5a6494ebf235b184e9b90e3c1b18b75caafb43c7e1cc681d001b9ce2c90dcc79 |
File details
Details for the file tsv2py-0.6.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: tsv2py-0.6.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 17.7 kB
- Tags: PyPy, manylinux: glibc 2.17+ i686, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c80ed8dca76e6647344baf94219c528bb87b379d4091d728f7d2e051b8597fe |
|
MD5 | 93ac485a342bfbc3e1e26b7a326519ea |
|
BLAKE2b-256 | d6b3bf36720a5a840a23b1f65f009bfeccc149fc002c63752fba246ae0a3b230 |
File details
Details for the file tsv2py-0.6.0-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: tsv2py-0.6.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 18.9 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39eac358ee43877e2f3d54490acfa1d23009798e344ea065eb97bb07b273af59 |
|
MD5 | 7252985e5c07628db4edfd7ca2612893 |
|
BLAKE2b-256 | 8bad2d8b263a18eefcb5a4244bec6c8f2861b3e25ad43118da78067f29f6d0a4 |
File details
Details for the file tsv2py-0.6.0-cp38-abi3-musllinux_1_1_x86_64.whl
.
File metadata
- Download URL: tsv2py-0.6.0-cp38-abi3-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 41.4 kB
- Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc5740123a1fc370ea3762b7d95c8e524c8a71c18f4c51ef6e4d0768339c875e |
|
MD5 | 6ea71dd45bc79cd968e1b264952965f2 |
|
BLAKE2b-256 | 6822f54d97e58d1df6d902c07bc2fb886edbaf2c35fffed4e4d3eefaccc082c1 |
File details
Details for the file tsv2py-0.6.0-cp38-abi3-musllinux_1_1_i686.whl
.
File metadata
- Download URL: tsv2py-0.6.0-cp38-abi3-musllinux_1_1_i686.whl
- Upload date:
- Size: 40.5 kB
- Tags: CPython 3.8+, musllinux: musl 1.1+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebdc5ea1207309904264fec675ff4ca08320b9f3ca86aaab1baf88bf0c6d5d8d |
|
MD5 | 173d494bdd0cc6b3aac875f7bfacdd63 |
|
BLAKE2b-256 | 3b35a333cc4a615977c4420248a2e0a65ebb94d62af7c8c78ab92ff1f37ac67b |
File details
Details for the file tsv2py-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: tsv2py-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 39.7 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48fd2e7ecf6740a3ed1232cff42b8e467b4ef1fceb0f68cab0aaeebe85fd41ef |
|
MD5 | 425f289361d69eb8a4761668c606492c |
|
BLAKE2b-256 | 1b1a9b6a5a57e26d78d4dbdedc49916931393ca64ada18c5c3d057f37184e904 |
File details
Details for the file tsv2py-0.6.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: tsv2py-0.6.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 38.0 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ i686, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a4abc6838c9d7c17f40bcadeba898fb940202d645e103f7eab0bd36e3936269 |
|
MD5 | 5462dbed3b0e3207e5cf373d212f2bb9 |
|
BLAKE2b-256 | d47feb2cc2dba4655ec2baca92d206f07d0cd62e8ae65f8a7867c77c595da3d4 |
File details
Details for the file tsv2py-0.6.0-cp38-abi3-macosx_10_9_universal2.whl
.
File metadata
- Download URL: tsv2py-0.6.0-cp38-abi3-macosx_10_9_universal2.whl
- Upload date:
- Size: 22.7 kB
- Tags: CPython 3.8+, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c7c488cf360f11c9d49067c32d54b6a938c93ba01656dffdf040438686ca9a3 |
|
MD5 | 0c1e16f7c5bf1d3df03cb6c9006bd403 |
|
BLAKE2b-256 | 82582f4956256f0ed6450d0aae9a99347ee08f3e7a2949732e378d3f47da5f58 |