High-performance parser and generator for PostgreSQL-compatible tab-separated values (TSV)
Project description
Parse and generate tab-separated values (TSV) data
Tab-separated values (TSV) is a simple and popular format for data storage, data transfer, exporting data from and importing data to relational databases. For example, PostgreSQL COPY moves data between PostgreSQL tables and standard file-system files or in-memory stores, and its text
format (a text file with one line per table row) is a generic version of TSV. Meanwhile, packages like asyncpg help efficiently insert, update or query data in bulk with binary data transfer between Python and PostgreSQL.
This package offers a high-performance alternative to convert data between a TSV text file and Python objects. The parser can read a TSV record into a Python tuple consisting of built-in Python types, one for each field. The generator can produce a TSV record from a tuple.
Quick start
from tsv.helper import Parser
# specify the column structure
parser = Parser(fields=(bytes, datetime, float, int, str, UUID, bool))
# read and parse an entire file
with open(tsv_path, "rb") as f:
py_records = parser.parse_file(f)
# read and parse a file line by line
with open(tsv_path, "rb") as f:
for line in f:
py_record = parser.parse_line(line)
TSV format
Text format is a simple tabular format in which each record (table row) occupies a single line.
- Output always begins with a header row, which lists data field names.
- Fields (table columns) are delimited by tab characters.
- Non-printable characters and special values are escaped with backslash (
\
), as shown below:
Escape | Interpretation |
---|---|
\N |
NULL value |
\0 |
NUL character (ASCII 0) |
\b |
Backspace (ASCII 8) |
\f |
Form feed (ASCII 12) |
\n |
Newline (ASCII 10) |
\r |
Carriage return (ASCII 13) |
\t |
Tab (ASCII 9) |
\v |
Vertical tab (ASCII 11) |
\\ |
Backslash (single character) |
This format allows data to be easily imported into a database engine, e.g. with PostgreSQL COPY.
Output in this format is transmitted as media type text/plain
or text/tab-separated-values
in UTF-8 encoding.
Parser
The parser understands the following Python types:
None
. This special value is returned for the TSV escape sequence\N
.bool
. A literaltrue
orfalse
is converted into a boolean value.bytes
. TSV escape sequences are reversed before the data is passed to Python as abytes
object. NUL bytes are permitted.datetime
. The input has to comply with RFC 3339 and ISO 8601. The timezone must be UTC (a.k.a. suffixZ
).float
.int
. Arbitrary-length integers are allowed.str
. TSV escape sequences are reversed before the data is passed to Python as astr
. NUL bytes are not allowed.uuid.UUID
. The input has to comply with RFC 4122, or be a string of 32 hexadecimal digits.
Internally, the implementation uses AVX2 instructions to
- parse RFC 3339 date-time strings into Python
datetime
objects, - parse RFC 4122 UUID strings or 32-digit hexadecimal strings into Python
UUID
objects, - and find
\t
delimiters between fields in a line.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for tsv2py-0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e29c35bb845a0679d04cd514c3c737d8a874dd676dec4319f3e9717ba1ec6259 |
|
MD5 | aeea0a11b1e54e52de7fcca7ef761393 |
|
BLAKE2b-256 | a46b785c1552fd5957d0e7cd0f988a882d87092958b834ebe2b5c01c32a81f71 |
Hashes for tsv2py-0.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 319da97cd96d5a22a683ce51a08fe96628ad42f18dd4e820fcf1637652455dce |
|
MD5 | 6628c5a22959fb4ad881c9861adf5679 |
|
BLAKE2b-256 | 12457aa55e778ecf9b976616f0671df236d91d1c14373be348cdbd779fc07aff |
Hashes for tsv2py-0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcb32bfdf1636df6b19044d3cfbfd299079fb67a7873b004e7c628641b52f4bb |
|
MD5 | 6f1799f11ccf8bccaeb83ba096d07be9 |
|
BLAKE2b-256 | ccae62e95ae3a815018b6b73852cb98d54417ebb9e71b3b85e873e383ac9e80b |
Hashes for tsv2py-0.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cb3035c9f599e4d11561b47bc87ee82c64b7dd0218cea9de7d3edab142e0a08 |
|
MD5 | 3239f430c5dc920855fd1dc93e06b2da |
|
BLAKE2b-256 | 6d692799f8c0482201eb88b1dbcc618e4cc5ea7c4e4d200f485a966af5ef416f |
Hashes for tsv2py-0.3-cp38-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41c2fd3c3ac8476b4e7c34e3987712c7084bae35f388ea4ce169c23e5ff6d335 |
|
MD5 | 4316dbd21654fddfc01c8d6b9438ee89 |
|
BLAKE2b-256 | cebdfe0eb74d2a2b36aad4258a9a08026952c907810dd8ba44974c58dc694332 |
Hashes for tsv2py-0.3-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac557e68625f38ba693b39875bf35ac3e406525d26b73a488b37a14ecded1a34 |
|
MD5 | 0beac49360dbc4244a2bf8d5d4647b40 |
|
BLAKE2b-256 | e2218c3cdd65df28f19845f34b9d4c543423c74badcced47947127ebba4d668f |
Hashes for tsv2py-0.3-cp38-abi3-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1993f74b270fb23926b0b9b2f9777139617afd0395ce2478231198589c5cfc54 |
|
MD5 | c54357e001acbbd607755475337f21d3 |
|
BLAKE2b-256 | 9e662fad20b7fb082931ddc8572912666cd470752bbc962fa79c8207a00a6251 |
Hashes for tsv2py-0.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c1257c408fdd99ab28f70ff04136e2f19f242dc75a082ac503df85aba89d29d |
|
MD5 | 97c4e91661c50239c80ba7e78292281d |
|
BLAKE2b-256 | 941d8ac45bf1f5a1887bad37ae4122ab3e0fb2fbc5b77350b029f7a5a190ad41 |
Hashes for tsv2py-0.3-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5bc6930953267159ab133ca248c9dbd0b9960278075ad1ce1abc3b65db08410 |
|
MD5 | ed55796faeb7ead2137f5ca16f239668 |
|
BLAKE2b-256 | f8bf773219c014540cf2f3383b04b7f7fac3fe65e140dede76904f43022829e6 |
Hashes for tsv2py-0.3-cp38-abi3-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62369a2a46842849bbf831c6caaf9e19244586254f0bb160c99036213ef3aebc |
|
MD5 | 04b047b1593abeb625311a3c076cb6c8 |
|
BLAKE2b-256 | f627205334ab50658967bbcac534428ebb45eb9e916c427a1b86415df8b2e8f3 |