This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.
Project description
SQL2Arrow
This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays. It's very useful for analyzing data dumped by mysqldump or other tools.
How to use
Installation
Install the latest SQL2arrow version with:
pip install sql2arrow
Parsing SQL str
import sql2arrow
sql_str = '''
INSERT INTO `region` VALUES
('', '', '2023-01-31 18:00:48', '2023-01-31 18:00:48', ''),
('1541947646568607746', 'region name', '2022-06-29 08:52:21', '2022-06-29 08:52:21', 'D99'),
('1541947680890597378', 'region name1', '2022-06-29 08:52:29', '2022-06-29 08:52:29', 'D98'),
('620422117205', 'region name7', '2021-10-25 18:23:48', '2021-10-25 18:23:48', 'D620422117');
'''
columns = [
sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]
arrow_data = sql2arrow.parse_sql(sql_str, columns)
Parsing sql files
import sql2arrow
sql_paths = [
"region.sql_0.gz", "region.sql_1.gz","region.sql_2.gz","region.sql_3.gz","region.sql_4.gz","region.sql_5.gz","region.sql_6.gz"
]
columns = [
sql2arrow.Column("region_code", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("region_name", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("create_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("update_time", sql2arrow.ArrowTypes.utf8()),
sql2arrow.Column("parent_region_code", sql2arrow.ArrowTypes.utf8())
]
partition_func_spec = sql2arrow.partition.IcebergPartitionFuncSpec()
partition_func_spec.add_partition("region_code", sql2arrow.partition.IcebergTransforms.bucket(30))
it = sql2arrow.SQLFile2ArrowIter(
sql_paths,
columns,
4,
1000,
sql2arrow.CompressionType.SNAPPY,
sql2arrow.Dialect.MYSQL,
partition_func_spec
)
for arr in it:
print(arr)
arro3
SQL2Arrow uses arro3 as the default Python library for Apache Arrow. Thanks to the Arrow PyCapsule Interface, we can seamlessly pass Arro3's Array data to other libraries compatible with the Arrow PyCapsule Interface, including PyArrow, Polars (v1.2+), Pandas (v2.2+), NanoArrow, and more, all with zero-copy memory.
# some codes from above
import pyarrow as pa
tables = [pa.Table.from_arrays(a, names=names) for a in arrs]
Limitations
Dialect
It currently supports only MySQL and PostgreSQL INSERT statements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sql2arrow-0.1.3.tar.gz.
File metadata
- Download URL: sql2arrow-0.1.3.tar.gz
- Upload date:
- Size: 45.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8887a606fc3e3548312e64a5e5b30036dd8a311edeb8634aca6c8f6026153b37
|
|
| MD5 |
d8ea3c5a52d6212939aafe2ca43586e5
|
|
| BLAKE2b-256 |
7f64644b43ac087b388f3b104e689593a153b329742d1b878ce64bed74fc557e
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 4.5 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bca05337fca332d04a2b381068f004dae3158fa2f2ecb870738b2b2cd6b8266b
|
|
| MD5 |
255869d3f11eaa51cd6bb96d4883d038
|
|
| BLAKE2b-256 |
8ac77a0b8e91614b647e9b7792803e77f4a9b53a89d9cafc6e02de0018728f57
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87a1cc5ddc90c5e1702cbfc0d3f70a0313e1482105bccc8d51585fd9dd67de81
|
|
| MD5 |
64fdb4f3bda86bf317201c30c17b5634
|
|
| BLAKE2b-256 |
beecd526b7296f942964dea1ad06912fb205633aa4b31cf327900a3477b71edb
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_i686.whl
- Upload date:
- Size: 5.3 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ i686
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0961fa3d017249a4f30546096185ca9f9ccc7a8c2027751838e7685e0e853a1c
|
|
| MD5 |
24fc22ab940ad9624aa5600ba0d1e488
|
|
| BLAKE2b-256 |
237f3485bab6df9c0c34cb33c154f2b43a7697d390f716a433c657b8209a196c
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_armv7l.whl
- Upload date:
- Size: 5.2 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ ARMv7l
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873f6d19d85d66e7deecb22a5996e423453b1a82973c746e140b884cc969340d
|
|
| MD5 |
9ca718ff4adfff9cd0b72d48d1946495
|
|
| BLAKE2b-256 |
3c0b2b4f15fe60c084007bbb241753cf1daa88ad8feb4b8e3c98d07320c7c374
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 4.9 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61fbdde2a5dcd1be4ce98e420735faad67cae40f4446806ec34ad8b59c71e309
|
|
| MD5 |
b69572819be1ba3d495aed6a87379843
|
|
| BLAKE2b-256 |
ef0405e62dc983f84c61f0ea5f6983b75355773cc8b0777a33081bbdc4d095f5
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 5.0 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d560cca8ef61e434cf6bc5506720ad41bb618c835a3746719116e84a720dddb
|
|
| MD5 |
d638db4436442136108fd8acdf152d11
|
|
| BLAKE2b-256 |
3d392e99494c1a7c3e716c25a5b89787e23a6bb2af681bca324301cda66e637d
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_s390x.whl
- Upload date:
- Size: 5.8 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ s390x
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4e72daecf028f8842d77ea6d84eac0cd537badc5a8a6433594187f23f15e406
|
|
| MD5 |
2e3e912e14f4e8c88dadf261abe6c52c
|
|
| BLAKE2b-256 |
94ad56ec5de96386f650f37e0298d1569e4a31a7905c49b4f54f82e72ccca5cf
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_ppc64le.whl
- Upload date:
- Size: 6.5 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ ppc64le
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c3a1f03f90ab6bca6fe9cbee73f44c8e833c62256eaede2d53c95730ab5aa4a
|
|
| MD5 |
bd576457a23bf199ae6c4770e164b26a
|
|
| BLAKE2b-256 |
9ed704daeedc1103be7ac30fd6ba4c522a4bcc4c65447f4f961f34f115245811
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_armv7l.whl
- Upload date:
- Size: 4.9 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ ARMv7l
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd5a69b2794f6c5c51cdac77efabd34e9070adc8e939d37e068e713d33937a0f
|
|
| MD5 |
42c82daa9593767b28760b519f5e417d
|
|
| BLAKE2b-256 |
d301d732b0acaef000b58faed4666046a8401019c01dee09c9db5a072db4c3b5
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 4.7 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4cbda4e80e26248a0d798c37d979c93e2f2caea62be275ff1f25b096f2bc953
|
|
| MD5 |
867a17536b38c4412396409dc129be30
|
|
| BLAKE2b-256 |
698c8a9f83f26b30e2231a70f692558a392178968129acc3522342e8691b3cab
|
File details
Details for the file sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: sql2arrow-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.2 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f025f334293dee107ea301110374ead232e7d8fac0b5f4d514aee28a7be5eeb
|
|
| MD5 |
a0972a43ffb026fe800419fd13349f15
|
|
| BLAKE2b-256 |
c2610a72f670a33026b26860d7635f963fa43b5216469fe34ea67e755556647b
|