Skip to main content

No project description provided

Project description

ConnectorX status discussions Downloads

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on numerical columns (cannot contain NULL) for SPJA queries.

Experimental: We are now providing federated query support (PostgreSQL only and do not support partition for now), you can write a single query to join tables from two or more databases! (JRE >= 1.8 is required)

import connectorx as cx

db1 = "postgresql://username1:password1@server1:port1/database1"
db2 = "postgresql://username2:password2@server2:port2/database2"

cx.read_sql({"db1": db1, "db2": db2}, "SELECT * FROM db1.nation n, db2.region r where n.n_regionkey = r.r_regionkey")

Check out more detailed usage and examples here. A general introduction of the project can be found in this blog post.

Installation

pip install connectorx

Check out here to see how to build python wheel from source.

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 21x less time (3x less memory and 13x less time compared with Pandas.). More on here.

How does ConnectorX achieve a lightening speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

How does ConnectorX download the data?

Upon receiving the query, e.g. SELECT * FROM lineitem, ConnectorX will first issue a LIMIT 1 query SELECT * FROM lineitem LIMIT 1 to get the schema of the result set.

Then, if partition_on is specified, ConnectorX will issue SELECT MIN($partition_on), MAX($partition_on) FROM (SELECT * FROM lineitem) to know the range of the partition column. After that, the original query is split into partitions based on the min/max information, e.g. SELECT * FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000. ConnectorX will then run a count query to get the partition size (e.g. SELECT COUNT(*) FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000). If the partition is not specified, the count query will be SELECT COUNT(*) FROM (SELECT * FROM lineitem).

Finally, ConnectorX will use the schema info as well as the count info to allocate memory and download data by executing the queries normally.

Once the downloading begins, there will be one thread for each partition so that the data are downloaded in parallel at the partition level. The thread will issue the query of the corresponding partition to the database and then write the returned data to the destination row-wise or column-wise (depends on the database) in a streaming fashion.

Supported Sources & Destinations

Example connection string, supported protocols and data types for each data source can be found here.

For more planned data sources, please check out our discussion.

Sources

  • Postgres
  • Mysql
  • Mariadb (through mysql protocol)
  • Sqlite
  • Redshift (through postgres protocol)
  • Clickhouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query
  • ODBC (WIP)
  • ...

Destinations

  • Pandas
  • PyArrow
  • Modin (through Pandas)
  • Dask (through Pandas)
  • Polars (through PyArrow)

Documentation

Doc: https://sfu-db.github.io/connector-x/intro.html Rust docs: stable nightly

Next Plan

Checkout our discussion to participate in deciding our next plan!

Historical Benchmark Results

https://sfu-db.github.io/connector-x/dev/bench/

Developer's Guide

Please see Developer's Guide for information about developing ConnectorX.

Supports

You are always welcomed to:

  1. Ask questions & propose new ideas in our github discussion.
  2. Ask questions in stackoverflow. Make sure to have #connectorx attached.

Organizations and Projects using ConnectorX

To add your project/organization here, reply our post here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

connectorx-0.3.1-cp310-none-win_amd64.whl (41.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

connectorx-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (46.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

connectorx-0.3.1-cp310-cp310-macosx_11_0_arm64.whl (42.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

connectorx-0.3.1-cp310-cp310-macosx_10_7_x86_64.whl (43.0 MB view details)

Uploaded CPython 3.10 macOS 10.7+ x86-64

connectorx-0.3.1-cp39-none-win_amd64.whl (41.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

connectorx-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (46.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

connectorx-0.3.1-cp39-cp39-macosx_11_0_arm64.whl (42.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

connectorx-0.3.1-cp39-cp39-macosx_10_7_x86_64.whl (43.0 MB view details)

Uploaded CPython 3.9 macOS 10.7+ x86-64

connectorx-0.3.1-cp38-none-win_amd64.whl (41.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

connectorx-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (46.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

connectorx-0.3.1-cp38-cp38-macosx_11_0_arm64.whl (42.0 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

connectorx-0.3.1-cp38-cp38-macosx_10_7_x86_64.whl (42.9 MB view details)

Uploaded CPython 3.8 macOS 10.7+ x86-64

connectorx-0.3.1-cp37-none-win_amd64.whl (41.3 MB view details)

Uploaded CPython 3.7 Windows x86-64

connectorx-0.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (46.6 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

connectorx-0.3.1-cp37-cp37m-macosx_11_0_arm64.whl (42.0 MB view details)

Uploaded CPython 3.7m macOS 11.0+ ARM64

connectorx-0.3.1-cp37-cp37m-macosx_10_7_x86_64.whl (42.9 MB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

File details

Details for the file connectorx-0.3.1-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 92e576ef9610b59f8e5456c12d22e5b0752d0207f586df82701987657909888b
MD5 d7f7145502d304cc23df57bb8b261e52
BLAKE2b-256 240d4a9abfc6d9e1fefe00cdef774b43106c696d551131c200661399e1b368a0

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 71d2c2678339fb01f89469bbe22e66e75cabcf727a52ed72d576fef5744ebc58
MD5 ae9dd4b161b52917317b568af8954773
BLAKE2b-256 b458b151c04b8fe4c880df2c87e8978a9081c21c82494e6982755d025052dbee

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aed31b08acebeb3ebbe53c0df846c686e7c27c4242bff3a75b72cf517d070257
MD5 7eb81d47687ea5bc71cfdc7d5c130fea
BLAKE2b-256 cc77515dbb742856ad26f53dd527059af58b0045c90799107f4bafbc46badb77

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 719750045e7c3b94c199271fbfe6aef47944768e711f27bcc606b498707e0054
MD5 f64de1ccc76390e1e018ec3bdef21edd
BLAKE2b-256 d0677877ecbed070e47f99bc36df38ca0cf343cee9b239265ba08e1938d78913

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 001b473e600b6d25af83b32674f98dccf49705a59bd6df724b5ba9beb236a0e0
MD5 2700eed863d5b0e6c077f3c1a7799f80
BLAKE2b-256 033d87f7731effdd170af5dd8f3afcc6bb5b8ece158f1745fc2479d6ed454e8d

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f1efb6ed547acc5837c2211e3d65d22948019d1653e7b30e522a4a4bd6d25fa8
MD5 19df7394852cafe3633f1ff01aa923ba
BLAKE2b-256 e79bd85fbeb956923523fd0b055f9d0ae1ddd33e546bf1796b1dce4eb248c359

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3011e1f9a27fd2a7b12c6a45bc29f6e7577a27418a3f607adaf54b301ff09068
MD5 760a3d16254aab559597085dd679d5a8
BLAKE2b-256 38fe620186f540acacee20fcd0b853208f6ec2a6b726e06480fbdbb23fd738a4

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 354c4126bcd7a9efbb8879feac92e1e7b0d0712f7e98665c392af663805491f8
MD5 29bbc78aee6a62495b2df921c33f68d4
BLAKE2b-256 26500fe21827f55ef19d7fe8e52dfeb0c1b69a660d222cdea9d570b80c317eae

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 3c5dedfd75cf44898c17cc84a1dd0ab6ed0fa54de0461f2d6aa4bcb2c2b0dc1d
MD5 67df24f4840275dcd236a70324c94676
BLAKE2b-256 a4955387144ea937ce79c1e57644c452494635642e6184df65e7c7f319e89e7e

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a666b967958fcf9fc0444a7b3603483ee23a2fe39f0da3d545ff199f376f7e4b
MD5 5ac4fdf06365b608216e22bce3a0e2e2
BLAKE2b-256 3b3ec0f0815860c6e93062f1780931bbd7defabff3b8fe1d5eceba60f6661871

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 027a3880629a7b33ae0c7a80ab4fa53286957a253af2dfe34f19adfea6b79b91
MD5 bd4d54d08f2d2efbf0b3504b5113d8ce
BLAKE2b-256 4c84095fbdc4e8328b29e29a1aa6e9c4c64228945107358d16e79d58d7d9919f

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 324c5075e8aa6698db8c877cb847f0d86172784db88ac0f3e6762aa9852330f3
MD5 4b315da024c18c6706b8681002abb921
BLAKE2b-256 da21df16b3076f63338d6ec12192a487498d4c3cc3b4f5186b58298a5e047fe1

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp37-none-win_amd64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp37-none-win_amd64.whl
Algorithm Hash digest
SHA256 0674b6389f8f2ba62155ac2f718df18f76f9de5c50d9911a5fefe7485e1c598e
MD5 f8e1ebbf05278f7c5a40e5268f13c6e9
BLAKE2b-256 a055563b1f3808cb69fc7e9afb5b16d5b08b013c47a8afac91d26b7286a71932

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3c8411631750d24c12e5e296720637909b8515d5faa3b5eaf7bb86c582d02667
MD5 c2f58cdf2a5f636e91b9068a98b7f198
BLAKE2b-256 0b9ea6c35d4e57fd72cb760ce095efc97f1a7ec16b1532911328f841d7ffefc1

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp37-cp37m-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp37-cp37m-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c5173e7252f593c46787627a46561b0d949eb80ab23321e045bbf6bd5131945c
MD5 84d23d640e7335b7123440b2296ad6e4
BLAKE2b-256 351debf95c16d276f092ca6840da13c0088af59a429ca69c2bccdc3f17aa1c3c

See more details on using hashes here.

File details

Details for the file connectorx-0.3.1-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.1-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 36c28cc59220998928e7b283eecf404e17e077dc3e525570096d0968b192cc64
MD5 4cbe7c040873d0ba068a3d4aaea27772
BLAKE2b-256 519c925d7df924a444c56811abfb911e4577dc0d231067898235d0745155ab04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page