Skip to main content

No project description provided

Project description

ConnectorX status discussions Downloads

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on numerical columns (cannot contain NULL) for SPJA queries.

Installation

pip install connectorx

Check out here to see how to build python wheel from source.

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 21x less time (3x less memory and 13x less time compared with Pandas.). More on here.

How does ConnectorX achieve a lightning speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

How does ConnectorX download the data?

Upon receiving the query, e.g. SELECT * FROM lineitem, ConnectorX will first issue a LIMIT 1 query SELECT * FROM lineitem LIMIT 1 to get the schema of the result set.

Then, if partition_on is specified, ConnectorX will issue SELECT MIN($partition_on), MAX($partition_on) FROM (SELECT * FROM lineitem) to know the range of the partition column. After that, the original query is split into partitions based on the min/max information, e.g. SELECT * FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000. ConnectorX will then run a count query to get the partition size (e.g. SELECT COUNT(*) FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000). If the partition is not specified, the count query will be SELECT COUNT(*) FROM (SELECT * FROM lineitem).

Finally, ConnectorX will use the schema info as well as the count info to allocate memory and download data by executing the queries normally.

Once the downloading begins, there will be one thread for each partition so that the data are downloaded in parallel at the partition level. The thread will issue the query of the corresponding partition to the database and then write the returned data to the destination row-wise or column-wise (depends on the database) in a streaming fashion.

Supported Sources & Destinations

Example connection string, supported protocols and data types for each data source can be found here.

For more planned data sources, please check out our discussion.

Sources

  • Postgres
  • Mysql
  • Mariadb (through mysql protocol)
  • Sqlite
  • Redshift (through postgres protocol)
  • Clickhouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query
  • Trino (available from v0.3.3)
  • ODBC (WIP)
  • ...

Destinations

  • Pandas
  • PyArrow
  • Modin (through Pandas)
  • Dask (through Pandas)
  • Polars (through PyArrow)

Documentation

Doc: https://sfu-db.github.io/connector-x/intro.html Rust docs: stable nightly

Next Plan

Checkout our discussion to participate in deciding our next plan!

Historical Benchmark Results

https://sfu-db.github.io/connector-x/dev/bench/

Developer's Guide

Please see Developer's Guide for information about developing ConnectorX.

Supports

You are always welcomed to:

  1. Ask questions & propose new ideas in our github discussion.
  2. Ask questions in stackoverflow. Make sure to have #connectorx attached.

Organizations and Projects using ConnectorX

To add your project/organization here, reply our post here

Citing ConnectorX

If you use ConnectorX, please consider citing the following paper:

Xiaoying Wang, Weiyuan Wu, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou. ConnectorX: Accelerating Data Loading From Databases to Dataframes. VLDB 2022.

BibTeX entry:

@article{connectorx2022,
  author    = {Xiaoying Wang and Weiyuan Wu and Jinze Wu and Yizhou Chen and Nick Zrymiak and Changbo Qu and Lampros Flokas and George Chow and Jiannan Wang and Tianzheng Wang and Eugene Wu and Qingqing Zhou},
  title     = {ConnectorX: Accelerating Data Loading From Databases to Dataframes},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {15},
  number    = {11},
  pages     = {2994--3003},
  year      = {2022},
  url       = {https://www.vldb.org/pvldb/vol15/p2994-wang.pdf},
}

Contributors

wangxiaoying
Xiaoying Wang
dovahcrow
Weiyuan Wu
Wukkkinz-0725
Null
Yizhou150
Yizhou
zen-xu
ZhengYu, Xu
wseaton
Will Eaton
AnatolyBuga
Anatoly Bugakov
Jordan-M-Young
Jordan M. Young
domnikl
Dominik Liebler
auyer
Rafael Passos
jinzew
Null
gruuya
Marko Grujic
alswang18
Alec Wang
lBilali
Lulzim Bilali
ritchie46
Ritchie Vink
houqp
QP Hou
wKollendorf
Null
glennpierce
Glenn Pierce
jorgecarleitao
Jorge Leitao
quambene
Null
CBQu
CbQu
tschm
Thomas Schmelzer
maxb2
Matthew Anderson
therealhieu
Hieu Minh Nguyen
FerriLuli
FerriLuli
alexander-beedie
Alexander Beedie
zzzdong
Null
venkashank
Null
surister
Ivan
phanindra-ramesh
Null
messense
Messense
kotval
Kotval
albcunha
Null
rursprung
Ralph Ursprung
MatsMoll
Mats Eikeland Mollestad
marianoguerra
Mariano Guerra
kevinheavey
Kevin Heavey
kayhoogland
Kay Hoogland
deepsourcebot
DeepSource Bot
AndrewJackson2020
Andrew Jackson
Cabbagec
Brandon
Amar1729
Amar Paul
aljazerzen
Aljaž Mur Eržen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

connectorx-0.3.3-cp312-none-win_amd64.whl (53.8 MB view details)

Uploaded CPython 3.12Windows x86-64

connectorx-0.3.3-cp312-cp312-manylinux_2_28_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

connectorx-0.3.3-cp312-cp312-macosx_11_0_arm64.whl (54.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

connectorx-0.3.3-cp312-cp312-macosx_10_12_x86_64.whl (55.5 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

connectorx-0.3.3-cp311-none-win_amd64.whl (53.8 MB view details)

Uploaded CPython 3.11Windows x86-64

connectorx-0.3.3-cp311-cp311-manylinux_2_28_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

connectorx-0.3.3-cp311-cp311-macosx_11_0_arm64.whl (54.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

connectorx-0.3.3-cp311-cp311-macosx_10_12_x86_64.whl (55.5 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

connectorx-0.3.3-cp310-none-win_amd64.whl (53.8 MB view details)

Uploaded CPython 3.10Windows x86-64

connectorx-0.3.3-cp310-cp310-manylinux_2_28_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

connectorx-0.3.3-cp310-cp310-macosx_11_0_arm64.whl (54.5 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

connectorx-0.3.3-cp310-cp310-macosx_10_12_x86_64.whl (55.5 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

connectorx-0.3.3-cp39-none-win_amd64.whl (53.8 MB view details)

Uploaded CPython 3.9Windows x86-64

connectorx-0.3.3-cp39-cp39-manylinux_2_28_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

connectorx-0.3.3-cp39-cp39-macosx_11_0_arm64.whl (54.5 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

connectorx-0.3.3-cp39-cp39-macosx_10_12_x86_64.whl (55.5 MB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

File details

Details for the file connectorx-0.3.3-cp312-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.3-cp312-none-win_amd64.whl
  • Upload date:
  • Size: 53.8 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.3-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 a37762f26ced286e9c06528f0179877148ea83f24263ac53b906c33c430af323
MD5 a50ded55d249791f2349c0873a63a1cf
BLAKE2b-256 cb0a4b33ba99394d900def8bd0a008d9f22eb2a0a93d58053621bbf052613cf0

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2eaca398a5dae6da595c8c521d2a27050100a94e4d5778776b914b919e54ab1e
MD5 79ecc910988db869d4dafc87bf944837
BLAKE2b-256 3a706f8799f1f5b39a1be90782f7b0ba904daf63aae75340e9d2f4089d8c77a9

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1b62f6cac84a7c41c4f61746262da059dd8af06d10de64ebde2d59c73e28c22b
MD5 ab9d172356bb16955c8f10ff7250775f
BLAKE2b-256 6fdade01abb84c48996d472b9331efe0ae4c9302235d45286b7bd82864d8ddfc

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dfefa3c55601b1a229dd27359a61c18977921455eae0c5068ec15d79900a096c
MD5 387bda5d68c64992101d03f05dfd24ac
BLAKE2b-256 a5d86077247b842824df2c512430c150be71402345ec5ab386e2afc6ab9af6e4

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp311-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.3-cp311-none-win_amd64.whl
  • Upload date:
  • Size: 53.8 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.3-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 6e6495cab5f23e638456622a880c774c4bcfc17ee9ed7009d4217756a7e9e2c8
MD5 b5dc3b77d72c6fac6080ca65ede9e32d
BLAKE2b-256 19b1cb84b8a747b27431fdc2099c819dadc0afe519f824dd7b64dfe53edb07e3

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f430c359e7977818f90ac8cce3bb7ba340469dcabee13e4ac7926f80e34e8c4d
MD5 5a3be3cf51fe34a106bcea9576968d9b
BLAKE2b-256 6018a9b0ff95e71a4dec33993cd8e2d821caae4ae25e55eaf3ac0a0d30aa8b48

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4010b466cafd728ec80adf387e53cc10668e2bc1a8c52c42a0604bea5149c412
MD5 0a493d6f6a5ad2df75eb002b87c7716a
BLAKE2b-256 80f31ea88070047caa3998727caccba9a6721083d568222786133af677c987c0

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d1d0cbb1b97643337fb7f3e30fa2b44f63d8629eadff55afffcdf10b2afeaf9c
MD5 eadd78aca5bb02c16b96db9e11d82250
BLAKE2b-256 a88e63f1c59682a225d2fe8716be44deae1ee3592933fbe191177c74ec33a6f5

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp310-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.3-cp310-none-win_amd64.whl
  • Upload date:
  • Size: 53.8 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.3-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 dff9e04396a76d3f2ca9ab1abed0df52497f19666b222c512d7b10f1699636c8
MD5 d20a41186da601c751c856306e8fd268
BLAKE2b-256 dc5b213fa316fee95d45ac696d7c9cf6cb19b9df1b80d37bb2fe4b8631670ea7

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b43b0abcfb954c497981bcf8f2b5339dcf7986399a401b9470f0bf8055a58562
MD5 00dc5701c730586f957a5c98d85064ca
BLAKE2b-256 4ce99b59eabb83b5d4961a7941c465a877ff45aa993da5a968482ab827150d30

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 da1970ec09ad7a65e25936a6d613f15ad2ce916f97f17c64180415dc58493881
MD5 902d38190b800364ab2f0aa8848dcaf3
BLAKE2b-256 fe8ef8df19b17056711f76ac17925fc8a452bd17e2e0d3121fd19b083ab12080

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4c0e61e44a62eaee2ffe89bf938c7431b8f3d2d3ecdf09e8abb2d159f09138f0
MD5 b12748fb7a5ff7e6bdefc33fbbaff71f
BLAKE2b-256 08a7dda511abdeb291b4ab56682323a70e2e125009055cf3791fa3b88e0fa96e

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp39-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.3-cp39-none-win_amd64.whl
  • Upload date:
  • Size: 53.8 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.3-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 e1e16404e353f348120d393586c58cad8a4ebf81e07f3f1dff580b551dbc863d
MD5 316a0e7b78767c4dc4c22c77f7b3701a
BLAKE2b-256 fe80cc6c3bf7d6914f3c25369ea74f00dc96fd7c47fe29c2b481d26f01b6b952

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9b001b78406dd7a1b8b7d61330bbcb73ea68f478589fc439fbda001ed875e8ea
MD5 736773a7c5e09d759f4d438ff4669494
BLAKE2b-256 4b75db87f3c6396d315ce3bbc3282860955aab7efd31fa08cbe11db0cc7ff88c

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 823170c06b61c7744fc668e6525b26a11ca462c1c809354aa2d482bd5a92bb0e
MD5 77962716a1366ab72d5a9e8c45fb9cb0
BLAKE2b-256 8e4529baccf5fa511bc1e15b1ee6bb2b4d7987ea5cdc263989a260c9161bfcac

See more details on using hashes here.

File details

Details for the file connectorx-0.3.3-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.3-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9267431fa88b00c60c6113d9deabe86a2ad739c8be56ee4b57164d3ed983b5dc
MD5 ff671335f2597e3a35bba532ececaa90
BLAKE2b-256 bb36ba1864c64f3ddbf87bfd7a6d21b61626bf6a80a9339c0249c0b6d1f450ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page