Skip to main content

No project description provided

Project description

ConnectorX status docs

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on integer columns for SPJA queries.

Check out more detailed usage and examples here.

Installation

pip install connectorx

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 11x less time.

How does ConnectorX achieve a lightening speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

Detailed Usage and Examples

API

connectorx.read_sql(conn: str, query: Union[List[str], str], *, return_type: str = "pandas", protocol: str = "binary", partition_on: Optional[str] = None, partition_range: Optional[Tuple[int, int]] = None, partition_num: Optional[int] = None)

Run the SQL query, download the data from database into a Pandas dataframe.

Parameters

  • conn(str): Connection string uri. Currently only PostgreSQL is supported.
  • query(string or list of string): SQL query or list of SQL queries for fetching data.
  • return_type(string, optional(default "pandas")): The return type of this function. Currently only "pandas" is supported.
  • partition_on(string, optional(default None)): The column to partition the result.
  • partition_range(tuple of int, optional(default None)): The value range of the partition column.
  • partition_num(int, optional(default None)): The number of partitions to generate.

Examples

  • Read a DataFrame from a SQL using a single thread

    import connectorx as cx
    
    postgres_url = "postgresql://username:password@server:port/database"
    query = "SELECT * FROM lineitem"
    
    cx.read_sql(postgres_url, query)
    
  • Read a DataFrame parallelly using 10 threads by automatically partitioning the provided SQL on the partition column (partition_range will be automatically queried if not given)

    import connectorx as cx
    
    postgres_url = "postgresql://username:password@server:port/database"
    query = "SELECT * FROM lineitem"
    
    cx.read_sql(postgres_url, query, partition_on="partition_col", partition_num=10)
    
  • Read a DataFrame parallelly using 2 threads by manually providing two partition SQLs (the schemas of all the query results should be same)

    import connectorx as cx
    
    postgres_url = "postgresql://username:password@server:port/database"
    queries = ["SELECT * FROM lineitem WHERE partition_col <= 10", "SELECT * FROM lineitem WHERE partition_col > 10"]
    
    cx.read_sql(postgres_url, queries)
    

Next Plan

Checkout our discussions to participate in deciding our next plan!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

connectorx-0.1.0-cp39-cp39-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.9Windows x86-64

connectorx-0.1.0-cp39-cp39-manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9

connectorx-0.1.0-cp39-cp39-macosx_10_15_intel.whl (2.6 MB view details)

Uploaded CPython 3.9macOS 10.15+ Intel (x86-64, i386)

connectorx-0.1.0-cp38-cp38-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.8Windows x86-64

connectorx-0.1.0-cp38-cp38-manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8

connectorx-0.1.0-cp38-cp38-macosx_10_15_intel.whl (2.6 MB view details)

Uploaded CPython 3.8macOS 10.15+ Intel (x86-64, i386)

connectorx-0.1.0-cp37-cp37m-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.7mWindows x86-64

connectorx-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.7m

connectorx-0.1.0-cp37-cp37m-macosx_10_15_intel.whl (2.6 MB view details)

Uploaded CPython 3.7mmacOS 10.15+ Intel (x86-64, i386)

File details

Details for the file connectorx-0.1.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8755b70ca0d11754ee65650916ecbdeb4c5fa6ec88f13fa66c4fd419b03180c3
MD5 40420f2c79aba508b66588cac4720695
BLAKE2b-256 f0c5d96fdbc5cbd798cf9904a720bdd1dd766515e20d6e14312c3d48d24af921

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 89815d6c64debf36a8ef1113bb2432594cbf04b31f35228f5e6fa854b4509f7c
MD5 ad7ec23b8e261ceccfa4b43dfdd1989e
BLAKE2b-256 ef28957bb51a0d527da94f610c2e68e4d78a820f0a758db8f727d47236bbe19c

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp39-cp39-macosx_10_15_intel.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp39-cp39-macosx_10_15_intel.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: CPython 3.9, macOS 10.15+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp39-cp39-macosx_10_15_intel.whl
Algorithm Hash digest
SHA256 0aa645b6fbbeea750dc78885c3e7b5269a7b30be6ecf926a4db222f93a71f7a4
MD5 83fb8eebdfdd0347e5f869db56394549
BLAKE2b-256 4168f10bf68eef2cc6f3eb7cfb256f132a435849bac608c74b18926e38de2970

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 18f5ee4d49c4a6b1f3226300e703b7c67127267d65ada31cd65b67bf34cfab92
MD5 e40f230c428041dde107cbef73610d01
BLAKE2b-256 54038ce6b00b5a3605047501c36454e6633dedc5e23969d6431f155d7fdfb4f6

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 05fb88bd60cf8bd5c92f251a82b7aa6be12694eda17cbbec8411d5c81537c504
MD5 50f48218448fd7913acbfea27cbd79e9
BLAKE2b-256 036c94ed72daa5bfb42c6c7fa6175357b2d710ae8ee67e6e8389d5484c789d67

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp38-cp38-macosx_10_15_intel.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp38-cp38-macosx_10_15_intel.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: CPython 3.8, macOS 10.15+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp38-cp38-macosx_10_15_intel.whl
Algorithm Hash digest
SHA256 4ee67500a765dd86b7b8cf676f47252d87cd91583fdae119d4dc59b53780e43d
MD5 934ead1df4b8a594eb999ab634ecc704
BLAKE2b-256 9a38fdacb829dd121f02c5c59bdec1fe466e5cbab1d38e97660a40c42c32cd28

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b14c69df35eef7493c7e4a6e881647ce3fe21705bd6100ffbd6f4164eab65660
MD5 c8e2ff9a7aa5e426df468e10fa1e66d3
BLAKE2b-256 65ae02472c3e9e378d71cffa24770fdbe235035f016fda358c92cd24923bed5b

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 04c32d2aa37d65604236f1ab8fc21cef95dd85840d494e3111cf7214368acfee
MD5 7b8347c2622ca531be17f3740cf2688c
BLAKE2b-256 af831bd167dff1e3dde6fafababc627079d4f970bbe200bc7ac0fb0bdc2e2480

See more details on using hashes here.

File details

Details for the file connectorx-0.1.0-cp37-cp37m-macosx_10_15_intel.whl.

File metadata

  • Download URL: connectorx-0.1.0-cp37-cp37m-macosx_10_15_intel.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: CPython 3.7m, macOS 10.15+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for connectorx-0.1.0-cp37-cp37m-macosx_10_15_intel.whl
Algorithm Hash digest
SHA256 22c594ca791f02588726513d8cec07569c00b2acfe2017f411ba7a984a1a8f96
MD5 2585c98fe3eabbab50727b031e99d4f3
BLAKE2b-256 ad414f4e261804bab2650e0b575cf18b08cbc8bbb1021dc2bc064460d162ad8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page