Skip to main content

No project description provided

Project description

ConnectorX status discussions Downloads

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on numerical columns (cannot contain NULL) for SPJA queries.

Experimental: We are now providing federated query support, you can write a single query to join tables from two or more databases!

import connectorx as cx
db1 = "postgresql://username1:password1@server1:port1/database1"
db2 = "postgresql://username2:password2@server2:port2/database2"
cx.read_sql({"db1": db1, "db2": db2}, "SELECT * FROM db1.nation n, db2.region r where n.n_regionkey = r.r_regionkey")

By default, we pushdown all joins from the same data source. More details for setup and configuration can be found here.

Check out more detailed usage and examples here. A general introduction of the project can be found in this blog post.

Installation

pip install connectorx

Check out here to see how to build python wheel from source.

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 21x less time (3x less memory and 13x less time compared with Pandas.). More on here.

How does ConnectorX achieve a lightning speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

How does ConnectorX download the data?

Upon receiving the query, e.g. SELECT * FROM lineitem, ConnectorX will first issue a LIMIT 1 query SELECT * FROM lineitem LIMIT 1 to get the schema of the result set.

Then, if partition_on is specified, ConnectorX will issue SELECT MIN($partition_on), MAX($partition_on) FROM (SELECT * FROM lineitem) to know the range of the partition column. After that, the original query is split into partitions based on the min/max information, e.g. SELECT * FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000. ConnectorX will then run a count query to get the partition size (e.g. SELECT COUNT(*) FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000). If the partition is not specified, the count query will be SELECT COUNT(*) FROM (SELECT * FROM lineitem).

Finally, ConnectorX will use the schema info as well as the count info to allocate memory and download data by executing the queries normally.

Once the downloading begins, there will be one thread for each partition so that the data are downloaded in parallel at the partition level. The thread will issue the query of the corresponding partition to the database and then write the returned data to the destination row-wise or column-wise (depends on the database) in a streaming fashion.

Supported Sources & Destinations

Example connection string, supported protocols and data types for each data source can be found here.

For more planned data sources, please check out our discussion.

Sources

  • Postgres
  • Mysql
  • Mariadb (through mysql protocol)
  • Sqlite
  • Redshift (through postgres protocol)
  • Clickhouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query
  • Trino
  • ODBC (WIP)
  • ...

Destinations

  • Pandas
  • PyArrow
  • Modin (through Pandas)
  • Dask (through Pandas)
  • Polars (through PyArrow)

Documentation

Doc: https://sfu-db.github.io/connector-x/intro.html Rust docs: stable nightly

Next Plan

Checkout our discussion to participate in deciding our next plan!

Historical Benchmark Results

https://sfu-db.github.io/connector-x/dev/bench/

Developer's Guide

Please see Developer's Guide for information about developing ConnectorX.

Supports

You are always welcomed to:

  1. Ask questions & propose new ideas in our github discussion.
  2. Ask questions in stackoverflow. Make sure to have #connectorx attached.

Organizations and Projects using ConnectorX

To add your project/organization here, reply our post here

Citing ConnectorX

If you use ConnectorX, please consider citing the following paper:

Xiaoying Wang, Weiyuan Wu, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou. ConnectorX: Accelerating Data Loading From Databases to Dataframes. VLDB 2022.

BibTeX entry:

@article{connectorx2022,
  author    = {Xiaoying Wang and Weiyuan Wu and Jinze Wu and Yizhou Chen and Nick Zrymiak and Changbo Qu and Lampros Flokas and George Chow and Jiannan Wang and Tianzheng Wang and Eugene Wu and Qingqing Zhou},
  title     = {ConnectorX: Accelerating Data Loading From Databases to Dataframes},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {15},
  number    = {11},
  pages     = {2994--3003},
  year      = {2022},
  url       = {https://www.vldb.org/pvldb/vol15/p2994-wang.pdf},
}

Contributors

wangxiaoying
Xiaoying Wang
dovahcrow
Weiyuan Wu
Wukkkinz-0725
Null
Yizhou150
Yizhou
zen-xu
ZhengYu, Xu
wseaton
Will Eaton
AnatolyBuga
Anatoly Bugakov
Jordan-M-Young
Jordan M. Young
domnikl
Dominik Liebler
auyer
Rafael Passos
pangjunrong
Pang Jun Rong (Jayden)
gruuya
Marko Grujic
jinzew
Null
alswang18
Alec Wang
lBilali
Lulzim Bilali
ritchie46
Ritchie Vink
houqp
QP Hou
wKollendorf
Null
CBQu
CbQu
quambene
Null
jorgecarleitao
Jorge Leitao
glennpierce
Glenn Pierce
alexander-beedie
Alexander Beedie
FerriLuli
FerriLuli
therealhieu
Hieu Minh Nguyen
maxb2
Matthew Anderson
tschm
Thomas Schmelzer
MatsMoll
Mats Eikeland Mollestad
rursprung
Ralph Ursprung
albcunha
Null
kotval
Kotval
messense
Messense
phanindra-ramesh
Null
surister
Ivan
venkashank
Null
zemelLeong
zemel leong
zzzdong
Null
marianoguerra
Mariano Guerra
kevinheavey
Kevin Heavey
kayhoogland
Kay Hoogland
deepsourcebot
DeepSource Bot
AndrewJackson2020
Andrew Jackson
Cabbagec
Brandon
Amar1729
Amar Paul
aljazerzen
Aljaž Mur Eržen
aimtsou
Aimilios Tsouvelekakis

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

connectorx-0.4.0-cp313-none-win_amd64.whl (25.3 MB view details)

Uploaded CPython 3.13 Windows x86-64

connectorx-0.4.0-cp313-cp313-manylinux_2_28_x86_64.whl (32.0 MB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ x86-64

connectorx-0.4.0-cp313-cp313-macosx_11_0_arm64.whl (25.9 MB view details)

Uploaded CPython 3.13 macOS 11.0+ ARM64

connectorx-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl (27.0 MB view details)

Uploaded CPython 3.13 macOS 10.12+ x86-64

connectorx-0.4.0-cp312-none-win_amd64.whl (25.4 MB view details)

Uploaded CPython 3.12 Windows x86-64

connectorx-0.4.0-cp312-cp312-manylinux_2_28_x86_64.whl (32.0 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

connectorx-0.4.0-cp312-cp312-macosx_11_0_arm64.whl (25.9 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

connectorx-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl (27.0 MB view details)

Uploaded CPython 3.12 macOS 10.12+ x86-64

connectorx-0.4.0-cp311-none-win_amd64.whl (25.3 MB view details)

Uploaded CPython 3.11 Windows x86-64

connectorx-0.4.0-cp311-cp311-manylinux_2_28_x86_64.whl (32.0 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

connectorx-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (25.9 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

connectorx-0.4.0-cp311-cp311-macosx_10_12_x86_64.whl (27.0 MB view details)

Uploaded CPython 3.11 macOS 10.12+ x86-64

connectorx-0.4.0-cp310-none-win_amd64.whl (25.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

connectorx-0.4.0-cp310-cp310-manylinux_2_28_x86_64.whl (32.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

connectorx-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (25.9 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

connectorx-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl (27.0 MB view details)

Uploaded CPython 3.10 macOS 10.12+ x86-64

File details

Details for the file connectorx-0.4.0-cp313-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp313-none-win_amd64.whl
Algorithm Hash digest
SHA256 635727407b0a14ff30cd02519e631766b2df55c94da2d57c7599e5a9a4ce2f4f
MD5 195443e2d2122be71da2bd773765d3e7
BLAKE2b-256 fde48fb9ca60a8895e873fb465ed8a893c6ee459e043f4ba12f6ba9f056f86bf

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 33a64fdd0b3efd32f54cce664f1022d448db5878e919e7a2f186f2af19eaa572
MD5 8f96ed021a8eb6871898dfcb63826127
BLAKE2b-256 49cfec8aeb7c69b1f8e57fddb9ea5c3f1c653402a62794b0b9be7349768d3b51

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dd80488823a174d0c73e4575b10f0c0eb86bc82cbd72e28fd2b611733308699d
MD5 3a5201ed5405f439f52e8ef7fb4d55f8
BLAKE2b-256 8244cd4a606b5ab89592d8310bb4ad94960ad516186f455274342e3784946b74

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b74266abb86c7d570579004d909d8b4ce5d18dab29e07c63015ea75ed0178adb
MD5 a9d7d646562d081306e7f6372c3c838f
BLAKE2b-256 370d43527ac16b096ef19ae7ffa2c4104f7d1315e965aa4f3059f28ec5fff450

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp312-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 d4c156df650860002f61155e2eff23d9c822c2e9f14e16d04e21dd1033b09cdc
MD5 d4f85acf3ab3e90087ad26a507a612f6
BLAKE2b-256 6cf84febd915b31343798824f2e4961b54bc354a70a7944e4c51b6925f84e6bd

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c3ebe1e95e4c82d69804d0165be4d4f80ff4bb1beac04d6946aa73ed04819440
MD5 f67d88a6d853a05078b5a572b79638ce
BLAKE2b-256 d0cb93261244ab037d686b00006445dee8feb56811d63fd61a5edac8463ab4be

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 42d9ae283541aca860f7082186ecb0525ff933b714e676dfe0e517e7943a9799
MD5 7983df63a409437a75f42282d6d701f5
BLAKE2b-256 0d216486b86315af8da9d510ff0d8195fc178b9aba5d2b0c0572b68288e54cef

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5c6bb6cc468ff77b18b71ee44b5ea7e3c72de3a08a4740040fed3361a6086659
MD5 4bf1af48f43f9760d22173e620dc2234
BLAKE2b-256 3c5b19bfbb374ba4ec8bdc9f74951352a9aab9d26ecc9459e519a57fcf940fdc

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 31b2c02f56360c6c4034d23c107772df98c8ab6102f71242bcb9a8a0fec06512
MD5 1f1868cce7fe4847808d80b097e28bb7
BLAKE2b-256 db5b137d7bbdca9e6addfa0746c7332ed71ea186f9559d2ae06618dd34416034

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d9b6d1832c071201cacb810b06c7311cb605f99dead0ee47935d38086f1c9d4b
MD5 295cdee63de335329abbba28d0b5d25d
BLAKE2b-256 5f8663cbb519775f006879ef602839e3c82524f64280d502204086ef3c3f666c

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 224da692923f3e2f6ae5677dfd0f00ac0374e3989832ee4de029112b09da2281
MD5 95a9e8c47dc9fa327dfe5cb976367af0
BLAKE2b-256 f06df1cfcac61302761f094814c8276513644e7fa7a8cfa03d0a30bc807d3c89

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fb3653c4ff1de8ada871c2ab883ed04dc2b7bb84704fcf08e6a23cf67deb931f
MD5 1fbe54ce62c0be28271529ba3ff3d995
BLAKE2b-256 256cf2c4d77b5231a7fae0454b03e8c0273c3d847311d6c062ec30fab025ff70

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 63a5a4961ed43e5f33eebaa9ee0ea0bb4e3f7d0f710d48e2b51ede3bd1e2af44
MD5 37e381ff6d81b0ec6f33e90d7f3b29bc
BLAKE2b-256 2d05764957b79ac61045cd696abf41e4b9b754fc4bb088ec417117244f960127

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 72103766090f81ed7f8aeba9183d7eb2f24b65de9678d74a61c16f85f89e8b81
MD5 68865087c18b1797b96cb128cbc722b3
BLAKE2b-256 37326a9c0c5adcf9ccb9d580895e4c28ced7530a72e389618c4e419c2ed63969

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 342d233ca0008ec7cfef2ab91da566f15a1326ac2f51c87c6dd9f77dc9a6549e
MD5 449fc1f2611da491a89f1b8a464fe558
BLAKE2b-256 fdf60a0a30ba8aff51bd9159d705b1d089b6df9b3677bb29389b21feada4c10f

See more details on using hashes here.

File details

Details for the file connectorx-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 447bd065feb69b63e51a8a056b1de3ccb98a48108b52987dba1b67555f01a8d8
MD5 8daaf2a465826c3b2445b357ae8f878a
BLAKE2b-256 627fdcde38b550ea37350afc9126fd1bc3fda4c8c1a97388d4039151ae2e3bae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page