Skip to main content

No project description provided

Project description

ConnectorX status discussions Downloads

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on numerical columns (cannot contain NULL) for SPJA queries.

Experimental: We are now providing federated query support, you can write a single query to join tables from two or more databases!

import connectorx as cx
db1 = "postgresql://username1:password1@server1:port1/database1"
db2 = "postgresql://username2:password2@server2:port2/database2"
cx.read_sql({"db1": db1, "db2": db2}, "SELECT * FROM db1.nation n, db2.region r where n.n_regionkey = r.r_regionkey")

By default, we pushdown all joins from the same data source. More details for setup and configuration can be found here.

Check out more detailed usage and examples here. A general introduction of the project can be found in this blog post.

Installation

pip install connectorx

Check out here to see how to build python wheel from source.

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 21x less time (3x less memory and 13x less time compared with Pandas.). More on here.

How does ConnectorX achieve a lightning speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

How does ConnectorX download the data?

Upon receiving the query, e.g. SELECT * FROM lineitem, ConnectorX will first get the schema of the result set. Depending on the data source, this process may envolve issuing a LIMIT 1 query SELECT * FROM lineitem LIMIT 1.

Then, if partition_on is specified, ConnectorX will issue SELECT MIN($partition_on), MAX($partition_on) FROM (SELECT * FROM lineitem) to know the range of the partition column. After that, the original query is split into partitions based on the min/max information, e.g. SELECT * FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000. ConnectorX will then run a count query to get the partition size (e.g. SELECT COUNT(*) FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000). If the partition is not specified, the count query will be SELECT COUNT(*) FROM (SELECT * FROM lineitem).

Finally, ConnectorX will use the schema info as well as the count info to allocate memory and download data by executing the queries normally.

Once the downloading begins, there will be one thread for each partition so that the data are downloaded in parallel at the partition level. The thread will issue the query of the corresponding partition to the database and then write the returned data to the destination row-wise or column-wise (depends on the database) in a streaming fashion.

Supported Sources & Destinations

Example connection string, supported protocols and data types for each data source can be found here.

For more planned data sources, please check out our discussion.

Sources

  • Postgres
  • Mysql
  • Mariadb (through mysql protocol)
  • Sqlite
  • Redshift (through postgres protocol)
  • Clickhouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query
  • Trino
  • ODBC (WIP)
  • ...

Destinations

  • Pandas
  • PyArrow
  • Modin (through Pandas)
  • Dask (through Pandas)
  • Polars (through PyArrow)

Documentation

Doc: https://sfu-db.github.io/connector-x/intro.html Rust docs: stable nightly

Next Plan

Checkout our discussion to participate in deciding our next plan!

Historical Benchmark Results

https://sfu-db.github.io/connector-x/dev/bench/

Developer's Guide

Please see Developer's Guide for information about developing ConnectorX.

Supports

You are always welcomed to:

  1. Ask questions & propose new ideas in our github discussion.
  2. Ask questions in stackoverflow. Make sure to have #connectorx attached.

Organizations and Projects using ConnectorX

To add your project/organization here, reply our post here

Citing ConnectorX

If you use ConnectorX, please consider citing the following paper:

Xiaoying Wang, Weiyuan Wu, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou. ConnectorX: Accelerating Data Loading From Databases to Dataframes. VLDB 2022.

BibTeX entry:

@article{connectorx2022,
  author    = {Xiaoying Wang and Weiyuan Wu and Jinze Wu and Yizhou Chen and Nick Zrymiak and Changbo Qu and Lampros Flokas and George Chow and Jiannan Wang and Tianzheng Wang and Eugene Wu and Qingqing Zhou},
  title     = {ConnectorX: Accelerating Data Loading From Databases to Dataframes},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {15},
  number    = {11},
  pages     = {2994--3003},
  year      = {2022},
  url       = {https://www.vldb.org/pvldb/vol15/p2994-wang.pdf},
}

Contributors

wangxiaoying
Xiaoying Wang
dovahcrow
Weiyuan Wu
Wukkkinz-0725
Null
Yizhou150
Yizhou
EricFecteau
EricFecteau
zen-xu
ZhengYu, Xu
pangjunrong
Pang Jun Rong (Jayden)
domnikl
Dominik Liebler
wseaton
Will Eaton
AnatolyBuga
Anatoly Bugakov
Jordan-M-Young
Jordan M. Young
jsjasonseba
Jason
auyer
Rafael Passos
gruuya
Marko Grujic
jinzew
Null
alswang18
Alec Wang
lBilali
Lulzim Bilali
ritchie46
Ritchie Vink
houqp
QP Hou
wKollendorf
Null
glennpierce
Glenn Pierce
jorgecarleitao
Jorge Leitao
quambene
Null
CBQu
CbQu
tschm
Thomas Schmelzer
maxb2
Matthew Anderson
JakkuSakura
Jakku Sakura
therealhieu
Hieu Minh Nguyen
FerriLuli
FerriLuli
alexander-beedie
Alexander Beedie
zzzdong
Null
zemelLeong
zemel leong
venkashank
Null
tvandelooij
tvandelooij
surister
Ivan
phanindra-ramesh
Null
messense
Messense
kotval
Kotval
albcunha
Null
rursprung
Ralph Ursprung
MatsMoll
Mats Eikeland Mollestad
marianoguerra
Mariano Guerra
kevinheavey
Kevin Heavey
kayhoogland
Kay Hoogland
DeflateAwning
DeflateAwning
deepsourcebot
DeepSource Bot
bealdav
David Beal
AndrewJackson2020
Andrew Jackson
Cabbagec
Brandon
Amar1729
Amar Paul
aljazerzen
Aljaž Mur Eržen
aimtsou
Aimilios Tsouvelekakis

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

connectorx-0.4.3-cp313-none-win_amd64.whl (32.9 MB view details)

Uploaded CPython 3.13Windows x86-64

connectorx-0.4.3-cp313-cp313-manylinux_2_35_aarch64.whl (38.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.35+ ARM64

connectorx-0.4.3-cp313-cp313-manylinux_2_28_x86_64.whl (41.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

connectorx-0.4.3-cp313-cp313-macosx_11_0_arm64.whl (34.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

connectorx-0.4.3-cp313-cp313-macosx_10_7_x86_64.whl (36.3 MB view details)

Uploaded CPython 3.13macOS 10.7+ x86-64

connectorx-0.4.3-cp312-none-win_amd64.whl (32.9 MB view details)

Uploaded CPython 3.12Windows x86-64

connectorx-0.4.3-cp312-cp312-manylinux_2_35_aarch64.whl (38.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ ARM64

connectorx-0.4.3-cp312-cp312-manylinux_2_28_x86_64.whl (41.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

connectorx-0.4.3-cp312-cp312-macosx_11_0_arm64.whl (34.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

connectorx-0.4.3-cp312-cp312-macosx_10_7_x86_64.whl (36.3 MB view details)

Uploaded CPython 3.12macOS 10.7+ x86-64

connectorx-0.4.3-cp311-none-win_amd64.whl (32.9 MB view details)

Uploaded CPython 3.11Windows x86-64

connectorx-0.4.3-cp311-cp311-manylinux_2_35_aarch64.whl (38.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ ARM64

connectorx-0.4.3-cp311-cp311-manylinux_2_28_x86_64.whl (41.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

connectorx-0.4.3-cp311-cp311-macosx_11_0_arm64.whl (34.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

connectorx-0.4.3-cp311-cp311-macosx_10_7_x86_64.whl (36.3 MB view details)

Uploaded CPython 3.11macOS 10.7+ x86-64

connectorx-0.4.3-cp310-none-win_amd64.whl (32.9 MB view details)

Uploaded CPython 3.10Windows x86-64

connectorx-0.4.3-cp310-cp310-manylinux_2_35_aarch64.whl (38.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ ARM64

connectorx-0.4.3-cp310-cp310-manylinux_2_28_x86_64.whl (41.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

connectorx-0.4.3-cp310-cp310-macosx_11_0_arm64.whl (34.5 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

connectorx-0.4.3-cp310-cp310-macosx_10_7_x86_64.whl (36.3 MB view details)

Uploaded CPython 3.10macOS 10.7+ x86-64

File details

Details for the file connectorx-0.4.3-cp313-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.4.3-cp313-none-win_amd64.whl
  • Upload date:
  • Size: 32.9 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for connectorx-0.4.3-cp313-none-win_amd64.whl
Algorithm Hash digest
SHA256 b9408ee082985a66fedbf12328c655dc2a8d86e4c5127666ffcd81d0c79c4b76
MD5 fc049dbfeed208ee9d695c2cf56ab1a5
BLAKE2b-256 b8a90c80560fe59d4b7643ca6acd20c25b240335eb1c23a502bed1d102e0a485

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp313-cp313-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp313-cp313-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 152746a662fec8bfbe520f127d2ca77a780fdc70ad1c199aadd7544612b291cc
MD5 69eccfa35e25036729b1ebac8812d90c
BLAKE2b-256 0c3cc7771b3af357f4ca32f9efd7eb440d705ac7e52abbb5111d987be3f22207

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2fcf42c9577e314d21caac0fa532ce485404ce3be082642e02543344ee0ca3b0
MD5 4149d00b8dac3ce0540ee2ede343c513
BLAKE2b-256 6bce35508d6b8974cf8d4edb4439e93346dc9a2ea8a4c679a2ee3e831f8245f4

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2713f737f8cc8768598d287a5807c34588fef63a572750b54c49400be6e8aa8e
MD5 e19b270eb99e4d9fcecb2c22f0d314fc
BLAKE2b-256 6853b5e216c7966f860aba462d944b3266ce0e16ab502697df84fa5ea9326e0c

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp313-cp313-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp313-cp313-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 bcae0f740ec17d14692692fb66b9beb32e1d9112c7940d1cee48efc52eb8c441
MD5 c19ce90660909729274584d23e300dd6
BLAKE2b-256 e0872cd961d3ad08944487fdd1d7b1b9b3b596d7bd98eea0f7e203cabf9a0e7b

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp312-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.4.3-cp312-none-win_amd64.whl
  • Upload date:
  • Size: 32.9 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for connectorx-0.4.3-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 58906d50564b059086dd2283e975b6a8874ce78edfb2672623f2d003ed987675
MD5 ecbf000125098342793a7033c9fd36a0
BLAKE2b-256 b807c21750870ab8cf6fd5b1e3068cb99630b294ef89f431627d3d00ea54179e

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp312-cp312-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp312-cp312-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 a347c22fe4b8d4f37f478ad3b8f2fa7dd2eccd9cc42c5493a2e4246221df3218
MD5 f10afff8e1c7da8006df3f70d712b3a8
BLAKE2b-256 d5e521ac4560dc50f31f1a1031c40fef96bad3da791758549cb9d400772b72c4

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 be944d72a8f64443efee5a0a992479d5d59608d36c348f24770b32a3b92537cf
MD5 fb28fe0d6ca285cbcc4f0718ff99a255
BLAKE2b-256 834318d59d706aed986e56a0b7d5ab11ad5ce6ae145514afc8b1ae5ff2a067b8

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba7afa7749a28d13ef4c0ca7cbd0e118d367db77f576c8a1fba42694dc9e3534
MD5 8a1ef9201b24c9c5ef6eacbe4e15227f
BLAKE2b-256 10b8c7711d41db17e1d574ab0d53dd648a2fcda603cfee0abb2f027647531ce1

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp312-cp312-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp312-cp312-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 a461fa210f69419a04a21f596fe593ed104c5d22100c468042ee55b15219d3bd
MD5 aa51f2a40366026cb8d29402e96782f8
BLAKE2b-256 aaf1df5c7be74ec02d69afb22c9ca9e9bd5a6b18d9b0de47d6e1c810fb4a4695

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp311-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.4.3-cp311-none-win_amd64.whl
  • Upload date:
  • Size: 32.9 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for connectorx-0.4.3-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 241b84cf5a866f175f3ed204bf4c17bcbdfe50c05d745a48d193d7f164a32ae2
MD5 e3aca456baa66cc8832d6ee9619957c2
BLAKE2b-256 b37f4df0d2db54499958345f1af915431177df180a66435163d015b660712b69

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp311-cp311-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp311-cp311-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 40ea58bd899d8d2cb37776dae53226a98e698587d8f9e37dbcf3a6e37e2209f8
MD5 df397d6d6b3a1f6075a13d7f4aa82d64
BLAKE2b-256 9c0ef03f4d47a36ca3446b0c446b8e35bd796e0bdf7f9dd360a913c13e41b4c9

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 34fa1c734ec670087ac9c2f4e2f66f594ae783773bc87f55324b26cf8521e4a3
MD5 1d4b6d2686540b2fade150ecb9098338
BLAKE2b-256 ae661291628b6295c9f931263df6dd299e4b0820ce123068d63d1cf98df6c249

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1fe7e4426982f91dcba74a0937696a339287f198e8c55d3186784b2ced2bff01
MD5 c4a417b0d29645547f420cba760a046e
BLAKE2b-256 630c31ec422b3daaf242c899acbe16249ab2c20dd1ed399bc80dacc46c7adb4d

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 5ffb92d51116a67a57d75c7427505e102b5ab8e71d96df2bbf5a0f0073284241
MD5 3c0f6c956fbe15e28c029f7be74b2456
BLAKE2b-256 c1a36640323778323268a9a5c0090cc1951c31d0387af12af3c94d42b212427b

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp310-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.4.3-cp310-none-win_amd64.whl
  • Upload date:
  • Size: 32.9 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for connectorx-0.4.3-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 d31ddc124309f3bc642970684abc912dab51bd5bb521246df8e021c487a6d938
MD5 fb835fa4f4f45323467bbcf6243aeca3
BLAKE2b-256 ae9e83f61855342c636123c4d37e58eba7b8f73cbb7e019b91e749e19d87854d

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp310-cp310-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp310-cp310-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 4af1a106d0e21d60ee783d11008e76d5c9af3665c85689544083a9655945bf16
MD5 31d2af80574820cb59730ba05c1f9980
BLAKE2b-256 36d6ef024496a9e8d5661998333b66b4bd895816d81088f786f32df05a358f93

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fb767647e1de4835b975a5627d0eadaa8de774ba6f123d455c058bdbcd8f8c0a
MD5 74277246f813a4a84b860dddaf8e9f0d
BLAKE2b-256 00f2dabc537604ae15a1fed77470fb7cb1ee96dac744c1a749a82b98418e1488

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 900c5af9d566399cb42fa808b0ea03817b654d106e25b6e530393ce55193e21a
MD5 7395da5ff72d63a0ac26aeac003146a5
BLAKE2b-256 87a90200aad2e9613e1292f18330ab1d52772d8bb2236382d3308fb850be6e22

See more details on using hashes here.

File details

Details for the file connectorx-0.4.3-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.3-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 0873df0741da01dfff101895b426a9cd0f84a98d87aba1c5873233ab134cda53
MD5 21442e03faa970dd73099d121f847c8a
BLAKE2b-256 02f91f829e1b768b628d29fc993fb1e7f812cd5e0480d98db59b005bb3f27723

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page