Skip to main content

Python wrapper for GOR's R SDK with Pandas serialization.

Project description


GORpyter

  1. Python package (with pandas serialization) that wraps the R SDK of the GOR Query API.
  2. Docker image for JupyterLab (Python & R kernels) with both the Python & R SDK dependencies installed.
  • gp.query() converts the R tibble dataframe into a pandas dataframe on the fly.
  • rpy2 package is used to wrap the gorr R library functions in Python.
  • Jupyter R kernel has tidyverse (tricky install) and gorr (non-CRAN) packages installed.
  • Docker image also includes OpenJDK 1.8 in case users want to install Spark.
TLDR
$ docker pull hashrocketsyntax/gorpyter:augustus
$ docker run -it -p 8888:8888 hashrocketsyntax/gorpyter:augustus

Read the rest of the documentation for complete setup & usage.


1. Docker Environment

LOCAL NOTEBOOK FOLDER

Create a folder on your local machine's desktop where you will store your notebooks. Keep the output of pwd handy as we will use it with the volumes yml key below. You can name the folder whatever you like. We'll call it 'notebooks'

$ cd ~/Desktop
$ mkdir notebooks
$ cd notebooks
$ pwd
'<PATH_TO_YOUR_NEW_FOLDER>'
DOCKER HARDWARE RESOURCES

In order to convert large (1M rows) R dataframes to Pandas dataframes, your Docker environment may need access to more memory. The memory is the most important setting below.

  • Stop any running containers.
  • Click on Docker icon in system tray.
  • Navigate to 'Preferences.'
  • Click the 'Resources' or 'Advanced' tab depending on your version of Docker.
  • Set the resources to the following:
  • Click 'Apply & Restart'
CPU:              <keep default, should already be at 4 CPU>
Memory:           <half of what's available in 'About this Mac', 4 or 8 GB>
Swap:             <set to maximum, 4GB>
Disk Image Size:  <keep default>
DOCKER IMAGE & MANIFEST

Pull in this pre-built image which contains a Jupyter environment equipped with R and Python 3.7 kernels as well as the GORpyter dependencies. It's built on top of Jupyter's latest DockerHub image jupyter/datascience-notebook:2ce7c06a61a1. If you want to customize this image your self, see Section 3.

$ docker pull hashrocketsyntax/gorpyter:augustus

Create a file named docker-compose.yml and open it with a text editor (nano or SublimeText).

$ touch docker-compose.yml
$ nano docker-compose.yml

Paste the text below into that file. Under the volumes key, paste in the output of pwd from above.

#docker-compose.yml
version: "3"
services:
  jupyter:
    image: "hashrocketsyntax/gorpyter:augustus"
    ports:
      - "8888:8888"
    volumes:
      - <PATH_TO_YOUR_NEW_FOLDER>:/usr/local/share/man/user_notebooks

Make sure you are in the same directory as the .yml file and run it like so.

$ docker-compose up

From the console output, grab the URL that looks like this http://127.0.0.1:8888/?token=<YOUR_TOKEN> and paste it into a browser.


2. JupyterLab Notebooks

TUTORIAL NOTEBOOKS

The Docker environment comes with example notebooks for both the Python and R SDK.

If you are running these notebook in the pre-built Docker environment, know that only files in the user_notebooks folder will be saved/ persisted. In fact, you won't be able to add/remove/copy/delete/save-changes to files outside of the user_notebooks directory.

#python_sdk_gorpyter.ipynb


pip install gorpyter --upgrade
import gorpyter as gp


gp.setup()
"""
  CHECKLIST
  =============================================

	✓ -- The version of your Jupyter Python environment is '3.7.3'.
	✓ -- The path of the Jupyter R enviroment being accessed by `rpy2` is '/opt/conda/lib/R'.

	✓ -- The Python dependencies of `gorpyter` are installed.
	✓ -- The `tidyverse` R library is installed in your R environment.
	✓ -- The `gorr` R library is installed in your R environment.
	✓ -- Python was able to successfully load `gorr` as a module via `rpy2`.

  =============================================
"""


api_key = "<YOUR_API_KEY>"
project = "<YOUR_PROJECT_NAME>"
conn = gp.connect(api_key, project)


gp.query("<YOUR_GOR_QUERY>", conn)
"""
	nor example -- "nor ./"
	gor example -- "gor -p chr10 #dbsnp# | TOP 100"

	Tested successfully on a 1,000,000 row result.

	Despite being run in Python, interupting the client's execution 
  of this function in `ctrl+c` manner is surprisingly still gracefully 
  intercepted by the gorr R library, and thus the server-side 
  execution of the query is simultaneously cleaned up.
"""
PYTHON PACKAGE
pip install gorpyter --upgrade
  • conda install will not work as this package has not been published to conda-forge.
  • Latest version number can be seen here https://pypi.org/project/gorpyter, as compared to output of pip show gorpyter.
  • Installing gorpyter will also install these dependencies: rpy2>=3.0.5, tzlocal>=2.0.0, pandas>=0.25.0, numpy>=1.17.0.
GOR QUERY LANGUAGE

http://docs.wuxinextcode.com/gor/basicGORqueries.html


3. Optional -- Customizing the Docker Image

In order to create your own Docker image based on jupyter/datascience-notebook:latest, follow these instructions.

With these files in the same directory:

  • Dockerfile
  • python_sdk.ipynb
  • r_sdk.ipynb

Run docker build -t your-image-name:your-new-tag . from within that directory.

Here are the commands contained in the Dockerfile.

#Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER layne sadler <lsadler@wuxinextcode.com>


# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes

# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
RUN Rscript -e "install.packages('https://cdn.nextcode.com/public/libraries/gorr_0.2.5.tar.gz', repos = NULL, type = 'source')"
ENV R_HOME=/opt/conda/lib/R

# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY python_sdk.ipynb /usr/local/share/man
COPY r_sdk.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man


# ====== SUDO ======
USER root

# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gorpyter-0.6.6.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

gorpyter-0.6.6-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file gorpyter-0.6.6.tar.gz.

File metadata

  • Download URL: gorpyter-0.6.6.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.4

File hashes

Hashes for gorpyter-0.6.6.tar.gz
Algorithm Hash digest
SHA256 72e03afda38cd245bd6593f3580d842c4732e8fd664f981bd69b82ecb7b39701
MD5 736335071169608af1e10385244a935c
BLAKE2b-256 295ceff368f63108218de5558d462b5a6d2db22117e5b6c60f313ddc45c737ac

See more details on using hashes here.

File details

Details for the file gorpyter-0.6.6-py3-none-any.whl.

File metadata

  • Download URL: gorpyter-0.6.6-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.4

File hashes

Hashes for gorpyter-0.6.6-py3-none-any.whl
Algorithm Hash digest
SHA256 4fef891ce5bed1cfe64411b13824dfe24c0f06630fc8e10618bc4747e2e177c8
MD5 13e4edbfd06e095f6a723290ab09c6a9
BLAKE2b-256 513cf628395d3276b939cb25165db4608bebf5ac215a8511929457ff67d4a87c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page