Skip to main content

Jupancon, connector to several DBs that returns pandas. Magic included.

Project description

jupancon

Database Connectors and SQL magics for Jupyter. jupancon = Jupyter + Pandas + Connectors.

Features

  • Connector to Redshift
    • Using user/pass
    • Using IAM profile
  • Connector to Bigquery (using google profile)
  • Connector to Databricks
  • Optional automatic tunnel setting through an SSH Bastion
  • Querying capabilities
  • IPython kernel magics for querying
  • Always returns Pandas DataFrames
  • Some hidden stuff I rather not document just yet so you don't nuke your Warehouse :) Will do when it's safer to use

Install

pip install jupancon

Configure

Write a ~/.jupancon/config.yml YAML file that looks similar to the following C&P from my actual config file (heavily censored for obvious reasons):

default: my-redshift-cluster

my-redshift-cluster: 
    type: redshift
    host: XXXXXX.XXXXXX.XXXXXXX.redshift.amazonaws.com
    # explicitly setting redshift port (optional)
    port: 5439
    user: XXXXXXXX
    pass: XXXXXXXX
    dbname: XXXXXX


my-redshift-using-iamprofile: 
    type: redshift
    host: XXXXXX.XXXXXX.XXXXXXX.redshift.amazonaws.com
    profile: XXXXXXXXX
    dbname: XXXXXX
    # NOTE: you can choose dbuser and it will be auto-created if it doesn't exist 
    dbuser: XXXXXX
    cluster: XXXXXX


my-gcp:
    type: bigquery
    project: XXXXX-XXXXX-123456

my-databricks:
    type: databricks
    server_hostname: XXXXXX.cloud.databricks.com
    http_path: /sql/XXX/XXXX/XXXXXXXXXX
    # optional
    catalog: XXXXXXX
    token: XXXXXXXXX


my-redshift-behind-sshbastion:
    type: redshift
    use_bastion: true
    bastion_server: censored.bastion.server.com
    bastion_user: XXXXXX
    bastion_host: XXXXXX.XXXXXX.XXXXXX.redshift.amazonaws.com
    host: censored.main.server.com
    user: XXXXXXXX
    pass: XXXXXXXX
    dbname: XXXXXX

Jupancon will also pick environment variables (which have preference over the config.yml).

  • JPC_DB_TYPE: redshift or bigquery
  • JPC_HOST: for example, XXXXXX.XXXXXX.XXXXXX.redshift.amazonaws.com
  • JPC_USER: User name
  • JPC_DB: Database name
  • JPC_PASS: Password
  • JPC_USE_BASTION: true or leave blank
  • JPC_BASTION_SERVER
  • JPC_BASTION_HOST
  • JPC_PROFILE: IAM profile (for IAM connection only)
  • JPC_CLUSTER: Redshift cluster (for IAM connection only)
  • JPC_DBUSER: Redshift user (for IAM connection only)

How to use

This library is developed primarily for usage within Jupyter Lab. It's likely to work in Jupyter Notebook and Ipython, but untested and unsupported at this stage. It also works and is being used in regular scripts, but it obviously loses its magic.

Regular usage

from jupancon import query, list_schemas, list_tables

list_schemas()

list_tables()

query("select * from foo")

Magical usage

from jupancon import load_magics

load_magics()
select * from foo
df = %select * from foo
%%sql

select * 
from foo
where cond = 1
and label = 'my nice label'

Development

Current status: Jupancon has enough basic features that it's worth open sourcing, but the documentation is still lacking.

TODO list

  • list_table("schema") to detect if schema doesn't exist and return error
  • Add query monitoring and cancelling functionality
  • Complete docs (low level stuff, exhaustive features, maybe sphinx/rdd?)
  • Add animated gifs to docs

Features that aren't worth adding right now

  • Autocomplete and autodiscover of databases is possible, but not trivial at all. In addition, I'll like to find a way of not adding any extra configuration. Regardless, not worth it until the TODO list above is tackled. See this project for a successful example.
  • Because of the current architecture of Jupyter Lab, syntax highlighting is not feasible to add (as it was in Jupyter Notebook). This might change in the future. See this git issue for more details.

A note about Unit Testing

I would like to publish decent unit testing, but this library is hard to test because all the databases currently queried for it's development are either tests that cost me money or private (my clients') databases. Any ideas on how to write an open source, non exploitable set of unit tests for Redshift or BigQuery are very welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupancon-0.3.3.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

jupancon-0.3.3-py2.py3-none-any.whl (9.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file jupancon-0.3.3.tar.gz.

File metadata

  • Download URL: jupancon-0.3.3.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for jupancon-0.3.3.tar.gz
Algorithm Hash digest
SHA256 f19b642aa0949c3af701e3690445a68001380eba648499d4e8aebbfc3e57cf5f
MD5 6d146b6f98e548bf701dc76574187443
BLAKE2b-256 44c488cdcb3cd06722083d4b18973b8d3a0a6e6a9e0322592b406e2a11c43170

See more details on using hashes here.

File details

Details for the file jupancon-0.3.3-py2.py3-none-any.whl.

File metadata

  • Download URL: jupancon-0.3.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for jupancon-0.3.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 47ec48dd635e59c6599108f587e885b5ca760a5dd0ffbf13523a87dbfefa458f
MD5 d4af6a631fcc99243054f419f5c719a1
BLAKE2b-256 11f589ede59b06f9c752c0c4e3a7e2cc973264c1404c0cfa1e237fca7d85a7d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page