Skip to main content

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.

Project description

SQL2Arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.

How to use

Parsing SQL str

import sql2arrow

sql_str = '''
INSERT INTO `region` VALUES
	('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
	('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
	('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
	('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]

arrow_data = sql2arrow.parse_sql(sql_str, columns)

Parsing sql files

import sql2arrow

sql_paths = [
    "region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]

columns = [
    sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
    sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]


partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))

# load data with partition func
partitioned_arrs = sql2arrow.load_sqls_with_partition_func(sql_paths, columns, partition_func_spec, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)

# load data from files one by one
arrs = sql2arrow.load_sqls(sql_paths, columns, sql2arrow.CompressionType.GZIP, sql2arrow.Dialect.MYSQL)

arro3

SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the Arrow PyCapsule Interface, we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.

# some codes from above

import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]

Limitations

Dialect

It currently supports only MySQL INSERT statements, but PostgreSQL support will be added soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql2arrow-0.1.0.tar.gz (43.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sql2arrow-0.1.0-cp38-abi3-win_amd64.whl (4.5 MB view details)

Uploaded CPython 3.8+Windows x86-64

sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_i686.whl (5.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ i686

sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_armv7l.whl (5.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARMv7l

sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_aarch64.whl (4.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl (5.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ x86-64

sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_s390x.whl (5.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ s390x

sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_ppc64le.whl (6.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ppc64le

sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_armv7l.whl (4.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARMv7l

sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl (4.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

sql2arrow-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (4.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file sql2arrow-0.1.0.tar.gz.

File metadata

  • Download URL: sql2arrow-0.1.0.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.8

File hashes

Hashes for sql2arrow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f4607c462bd0f35e45e2b960302f96844e6ed8259281457e383dbc49cc5394a
MD5 80af6844d95ef97b3238fe218ec727d2
BLAKE2b-256 a6902ff0fd737dd4caff34f0af0e0101cba7d37c1f16dbc4a2f570191376b308

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: sql2arrow-0.1.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 4.5 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.8

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8b0fed6820463d1899bd727b5bf85e7dfead608d65e65c88b2d7264b73b2d0dd
MD5 bdd49e2f9be441f57b9adaac22b7a8d5
BLAKE2b-256 c5a469c916007830c4872b1fb755eeaa0e853854bcc0ee8350bccfa21026efaa

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8689e20a29cc07c1216ffe3549d382898c073eb4f4a911f40f16960c14f250e6
MD5 0972d6f688f678ae172e3efa7cc75c37
BLAKE2b-256 ebbd9f234fbb37fec6ae4cd0a4d95f979ae8798ae0f71ef4a9811a4affe41110

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 d6d483c4dedc8f94df24eb03559961a643819d96df2d64ff9c08fefbad598cb6
MD5 d3f1f8efcb3c6d19bc6f358354f88fab
BLAKE2b-256 e11a9d6edfe92693821d2d11c585f2a5861c3a799e28efcd12966a7efded4db7

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 f054c7d77f844395745c528258cc7bbcb1d276782f9ddfb71c6727f56110a373
MD5 db6a2a48c942e03ef2d233afae97251a
BLAKE2b-256 2ae372e3c7978b70824813677b05ad21054164a752d5e0155a5177b023f070a5

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ccc9a4657946ecf325392f250dba6deffd4f8e4482fdda4898238abea4fcbf8c
MD5 932b786b15efd75b7e90cff8d02412bc
BLAKE2b-256 f2bc14fc5b9de111f69d26a540c56542452e230b093514cfa2630986f7359af0

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cd174155c4b220b264ca679157cae90e19cacc51be28a4395b64d7260660ac31
MD5 847e8d9c36f73d828b6585e5ffa88e9c
BLAKE2b-256 f1ec97d1e81a0c7eea2a3ba2099b90d12919f9169f31881119a0f0a15dc1485d

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_s390x.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_s390x.whl
Algorithm Hash digest
SHA256 7511538de8d00cbe890982e3b916ccef33eb0c10575a7c9ec63fbae807f4c345
MD5 2f6d359d581114c83ff64ce7d0374186
BLAKE2b-256 b94d358c294d96bdff25f8e42a062e923d750774e468c14410d4f932b4a53e8e

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 a43071b2360de4f436614d8d482267dc7516ecb8438665f7fc82fc0c940abfd5
MD5 27edaf011259aa94038fe63a4dbc8812
BLAKE2b-256 1d7a2136d5b8683850e878e6d2a81f4fcae52772c4620c8e74063abebb68aa65

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 daa8d87975e3d62a30aadc44087aa20e551030bad7e09558ac65332306c64e77
MD5 87909cd1c21618bcc79561f686d0e09b
BLAKE2b-256 678e7d9be0ecebd95ef6912c79cab033ee019d1e68b738aa0367167a05573caf

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 139eabd95fa3d707ce00f45842a2f66e8d05ae4081854cd743148c8338f23838
MD5 4b8da5804f84aecbab99aaa534fb8748
BLAKE2b-256 b9e4febe67306a4dbf01887d1459961db6cbfec3e9d5b83f64d4caa7e0cd2d26

See more details on using hashes here.

File details

Details for the file sql2arrow-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sql2arrow-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ca1ee249b82b29f03781ed00783aa5072bfb354bfe61d6fe6831ae8434099668
MD5 3764a56a68e5f1cc46efc6c19552ad64
BLAKE2b-256 6dd9487c09a1ea313f8df0d22d77d90604696c6c4f3b1fb3fd6040b8b6062266

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page