Python wrapper for GOR's R SDK with Pandas serialization.
Project description
GORpyter
- Python package (with pandas serialization) that wraps the R SDK of the GOR Query API.
- Docker image for JupyterLab (Python & R kernels) with both the Python & R SDK dependencies installed.
gp.query()
converts the R tibble dataframe into a pandas dataframe on the fly.- rpy2 package is used to wrap the gorr R library functions in Python.
- Jupyter R kernel has tidyverse (tricky install) and gorr (non-CRAN) packages installed.
- Docker image also includes OpenJDK 1.8 in case users want to install Spark.
TLDR
$ docker pull hashrocketsyntax/gorpyter:augustus
$ docker run -it -p 8888:8888 hashrocketsyntax/gorpyter:augustus
Read the rest of the documentation for complete setup & usage.
1. Docker Environment
LOCAL NOTEBOOK FOLDER
Create a folder on your local machine's desktop where you will store your notebooks. Keep the output of pwd
handy as we will use it with the volumes
yml key below. You can name the folder whatever you like. We'll call it 'notebooks'
$ cd ~/Desktop
$ mkdir notebooks
$ cd notebooks
$ pwd
'<PATH_TO_YOUR_NEW_FOLDER>'
DOCKER HARDWARE RESOURCES
In order to convert large (1M rows) R dataframes to Pandas dataframes, your Docker environment may need access to more memory. The memory
is the most important setting below.
- Stop any running containers.
- Click on Docker icon in system tray.
- Navigate to 'Preferences.'
- Click the 'Resources' or 'Advanced' tab depending on your version of Docker.
- Set the resources to the following:
- Click 'Apply & Restart'
CPU: <keep default, should already be at 4 CPU>
Memory: <half of what's available in 'About this Mac', 4 or 8 GB>
Swap: <set to maximum, 4GB>
Disk Image Size: <keep default>
DOCKER IMAGE & MANIFEST
Pull in this pre-built image which contains a Jupyter environment equipped with R and Python 3.7 kernels as well as the GORpyter dependencies. It's built on top of Jupyter's latest DockerHub image jupyter/datascience-notebook:2ce7c06a61a1
. If you want to customize this image your self, see Section 3.
$ docker pull hashrocketsyntax/gorpyter:augustus
Create a file named docker-compose.yml
and open it with a text editor (nano or SublimeText).
$ touch docker-compose.yml
$ nano docker-compose.yml
Paste the text below into that file. Under the volumes
key, paste in the output of pwd
from above.
#docker-compose.yml
version: "3"
services:
jupyter:
image: "hashrocketsyntax/gorpyter:augustus"
ports:
- "8888:8888"
volumes:
- <PATH_TO_YOUR_NEW_FOLDER>:/usr/local/share/man/user_notebooks
Make sure you are in the same directory as the .yml file and run it like so.
$ docker-compose up
From the console output, grab the URL that looks like this http://127.0.0.1:8888/?token=<YOUR_TOKEN>
and paste it into a browser.
2. JupyterLab Notebooks
TUTORIAL NOTEBOOKS
The Docker environment comes with example notebooks for both the Python and R SDK.
If you are running these notebook in the pre-built Docker environment, know that only files in the user_notebooks
folder will be saved/ persisted. In fact, you won't be able to add/remove/copy/delete/save-changes to files outside of the user_notebooks
directory.
#python_sdk_gorpyter.ipynb
pip install gorpyter --upgrade
import gorpyter as gp
gp.setup()
"""
CHECKLIST
=============================================
✓ -- The version of your Jupyter Python environment is '3.7.3'.
✓ -- The path of the Jupyter R enviroment being accessed by `rpy2` is '/opt/conda/lib/R'.
✓ -- The Python dependencies of `gorpyter` are installed.
✓ -- The `tidyverse` R library is installed in your R environment.
✓ -- The `gorr` R library is installed in your R environment.
✓ -- Python was able to successfully load `gorr` as a module via `rpy2`.
=============================================
"""
api_key = "<YOUR_API_KEY>"
project = "<YOUR_PROJECT_NAME>"
conn = gp.connect(api_key, project)
gp.query("<YOUR_GOR_QUERY>", conn)
"""
nor example -- "nor ./"
gor example -- "gor -p chr10 #dbsnp# | TOP 100"
Tested successfully on a 1,000,000 row result.
Despite being run in Python, interupting the client's execution
of this function in `ctrl+c` manner is surprisingly still gracefully
intercepted by the gorr R library, and thus the server-side
execution of the query is simultaneously cleaned up.
"""
PYTHON PACKAGE
pip install gorpyter --upgrade
conda install
will not work as this package has not been published to conda-forge.- Latest version number can be seen here
https://pypi.org/project/gorpyter
, as compared to output ofpip show gorpyter
. - Installing gorpyter will also install these dependencies: rpy2>=3.0.5, tzlocal>=2.0.0, pandas>=0.25.0, numpy>=1.17.0.
GOR QUERY LANGUAGE
3. Optional -- Customizing the Docker Image
In order to create your own Docker image based on jupyter/datascience-notebook:latest
, follow these instructions.
With these files in the same directory:
- Dockerfile
- python_sdk.ipynb
- r_sdk.ipynb
Run docker build -t your-image-name:your-new-tag .
from within that directory.
Here are the commands contained in the Dockerfile.
#Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER layne sadler <lsadler@wuxinextcode.com>
# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes
# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
RUN Rscript -e "install.packages('https://cdn.nextcode.com/public/libraries/gorr_0.2.5.tar.gz', repos = NULL, type = 'source')"
ENV R_HOME=/opt/conda/lib/R
# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY python_sdk.ipynb /usr/local/share/man
COPY r_sdk.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man
# ====== SUDO ======
USER root
# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gorpyter-0.6.6.tar.gz
.
File metadata
- Download URL: gorpyter-0.6.6.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72e03afda38cd245bd6593f3580d842c4732e8fd664f981bd69b82ecb7b39701 |
|
MD5 | 736335071169608af1e10385244a935c |
|
BLAKE2b-256 | 295ceff368f63108218de5558d462b5a6d2db22117e5b6c60f313ddc45c737ac |
File details
Details for the file gorpyter-0.6.6-py3-none-any.whl
.
File metadata
- Download URL: gorpyter-0.6.6-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fef891ce5bed1cfe64411b13824dfe24c0f06630fc8e10618bc4747e2e177c8 |
|
MD5 | 13e4edbfd06e095f6a723290ab09c6a9 |
|
BLAKE2b-256 | 513cf628395d3276b939cb25165db4608bebf5ac215a8511929457ff67d4a87c |