Skip to main content

No project description provided

Project description

ConnectorX status discussions Downloads

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on numerical columns (cannot contain NULL) for SPJA queries.

Installation

pip install connectorx

Check out here to see how to build python wheel from source.

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 21x less time (3x less memory and 13x less time compared with Pandas.). More on here.

How does ConnectorX achieve a lightning speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

How does ConnectorX download the data?

Upon receiving the query, e.g. SELECT * FROM lineitem, ConnectorX will first issue a LIMIT 1 query SELECT * FROM lineitem LIMIT 1 to get the schema of the result set.

Then, if partition_on is specified, ConnectorX will issue SELECT MIN($partition_on), MAX($partition_on) FROM (SELECT * FROM lineitem) to know the range of the partition column. After that, the original query is split into partitions based on the min/max information, e.g. SELECT * FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000. ConnectorX will then run a count query to get the partition size (e.g. SELECT COUNT(*) FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000). If the partition is not specified, the count query will be SELECT COUNT(*) FROM (SELECT * FROM lineitem).

Finally, ConnectorX will use the schema info as well as the count info to allocate memory and download data by executing the queries normally.

Once the downloading begins, there will be one thread for each partition so that the data are downloaded in parallel at the partition level. The thread will issue the query of the corresponding partition to the database and then write the returned data to the destination row-wise or column-wise (depends on the database) in a streaming fashion.

Supported Sources & Destinations

Example connection string, supported protocols and data types for each data source can be found here.

For more planned data sources, please check out our discussion.

Sources

  • Postgres
  • Mysql
  • Mariadb (through mysql protocol)
  • Sqlite
  • Redshift (through postgres protocol)
  • Clickhouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query
  • Trino
  • ODBC (WIP)
  • ...

Destinations

  • Pandas
  • PyArrow
  • Modin (through Pandas)
  • Dask (through Pandas)
  • Polars (through PyArrow)

Documentation

Doc: https://sfu-db.github.io/connector-x/intro.html Rust docs: stable nightly

Next Plan

Checkout our discussion to participate in deciding our next plan!

Historical Benchmark Results

https://sfu-db.github.io/connector-x/dev/bench/

Developer's Guide

Please see Developer's Guide for information about developing ConnectorX.

Supports

You are always welcomed to:

  1. Ask questions & propose new ideas in our github discussion.
  2. Ask questions in stackoverflow. Make sure to have #connectorx attached.

Organizations and Projects using ConnectorX

To add your project/organization here, reply our post here

Citing ConnectorX

If you use ConnectorX, please consider citing the following paper:

Xiaoying Wang, Weiyuan Wu, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou. ConnectorX: Accelerating Data Loading From Databases to Dataframes. VLDB 2022.

BibTeX entry:

@article{connectorx2022,
  author    = {Xiaoying Wang and Weiyuan Wu and Jinze Wu and Yizhou Chen and Nick Zrymiak and Changbo Qu and Lampros Flokas and George Chow and Jiannan Wang and Tianzheng Wang and Eugene Wu and Qingqing Zhou},
  title     = {ConnectorX: Accelerating Data Loading From Databases to Dataframes},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {15},
  number    = {11},
  pages     = {2994--3003},
  year      = {2022},
  url       = {https://www.vldb.org/pvldb/vol15/p2994-wang.pdf},
}

Contributors

wangxiaoying
Xiaoying Wang
dovahcrow
Weiyuan Wu
Wukkkinz-0725
Null
Yizhou150
Yizhou
zen-xu
ZhengYu, Xu
wseaton
Will Eaton
AnatolyBuga
Anatoly Bugakov
Jordan-M-Young
Jordan M. Young
domnikl
Dominik Liebler
auyer
Rafael Passos
jinzew
Null
gruuya
Marko Grujic
alswang18
Alec Wang
lBilali
Lulzim Bilali
ritchie46
Ritchie Vink
houqp
QP Hou
wKollendorf
Null
glennpierce
Glenn Pierce
jorgecarleitao
Jorge Leitao
quambene
Null
CBQu
CbQu
tschm
Thomas Schmelzer
maxb2
Matthew Anderson
therealhieu
Hieu Minh Nguyen
FerriLuli
FerriLuli
alexander-beedie
Alexander Beedie
zzzdong
Null
venkashank
Null
surister
Ivan
phanindra-ramesh
Null
messense
Messense
kotval
Kotval
albcunha
Null
rursprung
Ralph Ursprung
MatsMoll
Mats Eikeland Mollestad
marianoguerra
Mariano Guerra
kevinheavey
Kevin Heavey
kayhoogland
Kay Hoogland
deepsourcebot
DeepSource Bot
AndrewJackson2020
Andrew Jackson
Cabbagec
Brandon
Amar1729
Amar Paul
aljazerzen
Aljaž Mur Eržen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

connectorx-0.3.4a1-cp312-none-win_amd64.whl (53.5 MB view details)

Uploaded CPython 3.12 Windows x86-64

connectorx-0.3.4a1-cp312-cp312-manylinux_2_28_x86_64.whl (60.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

connectorx-0.3.4a1-cp312-cp312-macosx_11_0_arm64.whl (54.2 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

connectorx-0.3.4a1-cp312-cp312-macosx_10_12_x86_64.whl (55.3 MB view details)

Uploaded CPython 3.12 macOS 10.12+ x86-64

connectorx-0.3.4a1-cp311-none-win_amd64.whl (53.5 MB view details)

Uploaded CPython 3.11 Windows x86-64

connectorx-0.3.4a1-cp311-cp311-manylinux_2_28_x86_64.whl (60.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

connectorx-0.3.4a1-cp311-cp311-macosx_11_0_arm64.whl (54.2 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

connectorx-0.3.4a1-cp311-cp311-macosx_10_12_x86_64.whl (55.2 MB view details)

Uploaded CPython 3.11 macOS 10.12+ x86-64

connectorx-0.3.4a1-cp310-none-win_amd64.whl (53.5 MB view details)

Uploaded CPython 3.10 Windows x86-64

connectorx-0.3.4a1-cp310-cp310-manylinux_2_28_x86_64.whl (60.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

connectorx-0.3.4a1-cp310-cp310-macosx_11_0_arm64.whl (54.2 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

connectorx-0.3.4a1-cp310-cp310-macosx_10_12_x86_64.whl (55.3 MB view details)

Uploaded CPython 3.10 macOS 10.12+ x86-64

connectorx-0.3.4a1-cp39-none-win_amd64.whl (53.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

connectorx-0.3.4a1-cp39-cp39-manylinux_2_28_x86_64.whl (60.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

connectorx-0.3.4a1-cp39-cp39-macosx_11_0_arm64.whl (54.2 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

connectorx-0.3.4a1-cp39-cp39-macosx_10_12_x86_64.whl (55.3 MB view details)

Uploaded CPython 3.9 macOS 10.12+ x86-64

File details

Details for the file connectorx-0.3.4a1-cp312-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 e72e7917a9404a0a1b31610dd426cd6ccce7aaa44bd671a1319fed12a5e2dbca
MD5 08ec6f76989719fa583aed94fa3883ba
BLAKE2b-256 d58c542ef824621efdb19db61f068cedd98a4a7648eec03226f6b5a30187a36b

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4c449c80b16dab32157ecc559ee6f7ec7e779d39352e539bb71dd66a915ef4fc
MD5 7cb45d7665c1130309797c9c9b61eaa1
BLAKE2b-256 c00aa40aafbd50d4b8283f55100db50ff59da31df89d3cbef156e9e17a00999b

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ed6e0a22d236e70c2e485d64d537eb32a998a3d2e701b62ef1c609bab58a1704
MD5 5735cea3b7cbd6177efcdb39fcd8e9a7
BLAKE2b-256 97cd265b5cb2457f919865ac255be461d57dc22fcab8496b5b0893d751915ab1

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b08ef721e0083d3bf40e98bf2f2b3d445c8c322b54e38c13d2734c95cfc4f9e4
MD5 f6bc79acc89ccf41c0ba38ba384a4698
BLAKE2b-256 bc36a02e9ead540a1b62deee9df876d6af5a0d91a7266adda03edd9cf98cb127

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 71b2f47f16a11e9bb563d75107a4d91cfda50172ce5b572bbcf0098b4285ffe0
MD5 8f28db8f9397f924c49432ea4706479a
BLAKE2b-256 fdbd912c24c4200a11ed94732f0cbf43e47975a31a9e50f500f72556e8bdb710

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5ef9d86d04aa2237520b055dc95f083a8a87aeb8d80a56ba05dcfde7d891f865
MD5 837971277519c9df32f1c0a2f97f9e67
BLAKE2b-256 0b7b5804e5638f2d4e9ca3abf82daf90545ce63974c65390e736856f3edb5d0f

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0c35eed91fe210c6c54cdce055ddf0fdc5551c0b8082fce25385c4a213c7c99e
MD5 dfaf41e25e586b45dbc7e2f4b3355164
BLAKE2b-256 da692ce51c50012e28d8586ee16f396809a4acaf30599c591d072848fb3d6a7a

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 58814fc06a9906d02db37cc52b636582318ed3ce0f5a4c00ebf51b09253820de
MD5 51e4a16bfc0949585ae9e89e89e7c9de
BLAKE2b-256 607c337c5b5a5c660544f007327b00c0f7703d70082e1228b842a3afae90d7af

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 e51207525caf0f424c8f12609804ef225feed179b5542627df303ef67cee6568
MD5 96040cdf96c02a5bc543e49d1ac5093c
BLAKE2b-256 67119e39c11b743215ce362144d0174a0f9b2d757173533739bc4bade7083917

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 113e15340e124a2e824f470dbcd2ea96d94466c16ed508dbc45fcd2a3e8869eb
MD5 62fa0d9e804755afd6973ca13ef6aa1f
BLAKE2b-256 1e39c2acad59d2e11ec4d0dfaab87fc9f39eaef87c184415c9905b240fd3cc3e

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a9d2bd6c179a941f455f01d087f245a71756cb0c138820563a3b1bcbb24e292
MD5 a92a9ae989be00c89d28b0733a781012
BLAKE2b-256 dcceb8e4f694da1c2a99b10f0c7a8068964ac1c4c9f51b4e6350b40d0c899a8e

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d1cfa69c8556ccdd62d1140554bf97b89e8bf2c39920a33918271a156491dc39
MD5 c6013ac7141dd78e3741a21ad744f3b4
BLAKE2b-256 b465793085d04925e3efc73fb5ed010679a1ef395c5564a570d3badd2b842660

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 8f7497a7efac0c7445083700f18deb14f875224673ecaec3be8d17e27bee8894
MD5 876efc6fbc2d0dcec4e3df7b797a1756
BLAKE2b-256 a99305d57e6287ac2f8c48d99fe1700a419f0adee96fd3de09f316befe1c1a7a

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9888a4f652dbab3a489fd5323b6458b10ce9d8e6cf865022cf7aa845bbd3057e
MD5 403afa090b0c30785594c25da34e7c8b
BLAKE2b-256 1da1a20eff70eceaae6eac01625233143f56ed1e34a2007bcf5e808cebfc5fa6

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e1434fd8f26b75b530105ed680e5677f4288e80bfcbee064e4e1b6a87ec346d9
MD5 f9f05e1d797fa4ab58ff88ef16522ef4
BLAKE2b-256 e46b8489c6b6ca5480179426f98cceab4f572cb52467789598919099d3d5fad8

See more details on using hashes here.

File details

Details for the file connectorx-0.3.4a1-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.4a1-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9ff0e04d3cd2479b31c80914aeeb58d0fad8a56eea035a27525111965f9ca7ea
MD5 6d982583db10d85644e0cca12b9ef359
BLAKE2b-256 470af656f919763e75d158b1a337ed90643ccde4ebcda3d704919b759c81f4f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page