Skip to main content

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.

Project description

SQL2Arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.

How to use

Installation

Install the latest SQL2arrow version with:

pip install sql2arrow

Parsing SQL str

import sql2arrow

sql_str = '''
INSERT INTO `region` VALUES
	('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
	('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
	('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
	('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]

arrow_data = sql2arrow.parse_sql(sql_str, columns)

Parsing sql files

import sql2arrow

sql_paths = [
    "region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]


partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))


it = sql2arrow.SQLFile2ArrowIter(
    sql_paths,
    columns,
    4,
    1000,
    sql2arrow.CompressionType.SNAPPY,
    sql2arrow.Dialect.MYSQL,
    partition_func_spec
)

for arr in it:
    print(arr)

arro3

SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the Arrow PyCapsule Interface, we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.

# some codes from above

import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]

Limitations

Dialect

It currently supports only MySQL and PostgreSQL INSERT statements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql2arrow-0.1.3.tar.gz (45.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sql2arrow-0.1.3-cp38-abi3-win_amd64.whl (4.5 MB view details)

Uploaded CPython 3.8+Windows x86-64

sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl (5.3 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ i686

sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl (5.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARMv7l

sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl (5.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ x86-64

sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl (5.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ s390x

sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl (6.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ppc64le

sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl (4.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARMv7l

sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl (4.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl (4.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file sql2arrow-0.1.3.tar.gz.

File metadata

  • Download URL: sql2arrow-0.1.3.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for sql2arrow-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8887a606fc3e3548312e64a5e5b30036dd8a311edeb8634aca6c8f6026153b37
MD5 d8ea3c5a52d6212939aafe2ca43586e5
BLAKE2b-256 7f64644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: sql2arrow-0.1.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 4.5 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bca05337fca332d04a2b381068f004dae3158fa2f2ecb870738b2b2cd6b8266b
MD5 255869d3f11eaa51cd6bb96d4883d038
BLAKE2b-256 8ac77a0b8e91614b647e9b7792803e77f4a9b53a89d9cafc6e02de0018728f57

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 87a1cc5ddc90c5e1702cbfc0d3f70a0313e1482105bccc8d51585fd9dd67de81
MD5 64fdb4f3bda86bf317201c30c17b5634
BLAKE2b-256 beecd526b7296f942964dea1ad06912fb205633aa4b31cf327900a3477b71edb

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 0961fa3d017249a4f30546096185ca9f9ccc7a8c2027751838e7685e0e853a1c
MD5 24fc22ab940ad9624aa5600ba0d1e488
BLAKE2b-256 237f3485bab6df9c0c34cb33c154f2b43a7697d390f716a433c657b8209a196c

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 873f6d19d85d66e7deecb22a5996e423453b1a82973c746e140b884cc969340d
MD5 9ca718ff4adfff9cd0b72d48d1946495
BLAKE2b-256 3c0b2b4f15fe60c084007bbb241753cf1daa88ad8feb4b8e3c98d07320c7c374

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 61fbdde2a5dcd1be4ce98e420735faad67cae40f4446806ec34ad8b59c71e309
MD5 b69572819be1ba3d495aed6a87379843
BLAKE2b-256 ef0405e62dc983f84c61f0ea5f6983b75355773cc8b0777a33081bbdc4d095f5

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8d560cca8ef61e434cf6bc5506720ad41bb618c835a3746719116e84a720dddb
MD5 d638db4436442136108fd8acdf152d11
BLAKE2b-256 3d392e99494c1a7c3e716c25a5b89787e23a6bb2af681bca324301cda66e637d

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl
Algorithm Hash digest
SHA256 b4e72daecf028f8842d77ea6d84eac0cd537badc5a8a6433594187f23f15e406
MD5 2e3e912e14f4e8c88dadf261abe6c52c
BLAKE2b-256 94ad56ec5de96386f650f37e0298d1569e4a31a7905c49b4f54f82e72ccca5cf

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 3c3a1f03f90ab6bca6fe9cbee73f44c8e833c62256eaede2d53c95730ab5aa4a
MD5 bd576457a23bf199ae6c4770e164b26a
BLAKE2b-256 9ed704daeedc1103be7ac30fd6ba4c522a4bcc4c65447f4f961f34f115245811

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 cd5a69b2794f6c5c51cdac77efabd34e9070adc8e939d37e068e713d33937a0f
MD5 42c82daa9593767b28760b519f5e417d
BLAKE2b-256 d301d732b0acaef000b58faed4666046a8401019c01dee09c9db5a072db4c3b5

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d4cbda4e80e26248a0d798c37d979c93e2f2caea62be275ff1f25b096f2bc953
MD5 867a17536b38c4412396409dc129be30
BLAKE2b-256 698c8a9f83f26b30e2231a70f692558a392178968129acc3522342e8691b3cab

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5f025f334293dee107ea301110374ead232e7d8fac0b5f4d514aee28a7be5eeb
MD5 a0972a43ffb026fe800419fd13349f15
BLAKE2b-256 c2610a72f670a33026b26860d7635f963fa43b5216469fe34ea67e755556647b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page