Python wrapper for GOR's R SDK with Pandas serialization.
Project description
GORpyter
- Python package (with pandas serialization) that wraps the R SDK of the GOR Query API.
- Docker image for JupyterLab (Python & R kernels) with both the Python & R SDK dependencies installed.
gp.query()
converts the R tibble dataframe into a pandas dataframe on the fly.- rpy2 package is used to wrap the gorr R library functions in Python.
- Jupyter R kernel has tidyverse (tricky install) and gorr (non-CRAN) packages installed.
- Docker image also includes OpenJDK 1.8 in case users want to install Spark.
TLDR
$ docker pull hashrocketsyntax/gorpyter:aug22
$ docker run -it -p 8888:8888 hashrocketsyntax/gorpyter:aug22
Read the rest of the documentation for complete setup & usage.
1. Docker Environment
LOCAL NOTEBOOK FOLDER
Create a folder on your local machine's desktop where you will store your notebooks. Keep the output of pwd
handy as we will use it with the volumes
yml key below. You can name the folder whatever you like, we'll call it 'notebooks'
$ cd ~/Desktop
$ mkdir notebooks
$ cd notebooks
$ pwd
'<PATH_TO_YOUR_NEW_FOLDER>'
DOCKER IMAGE & MANIFEST
Pull in this pre-built image which contains a Jupyter environment equipped with R and Python 3.7 kernels as well as the GORpyter dependencies. It's built on top of Jupyter's latest DockerHub image jupyter/datascience-notebook:2ce7c06a61a1
. If you want to customize this image your self, see Section 3.
$ docker pull hashrocketsyntax/gorpyter:aug22
Create a file named docker-compose.yml
and open it with a text editor (nano or SublimeText).
$ touch docker-compose.yml
$ nano docker-compose.yml
Paste the text below into that file. Under the volumes
key, paste in the output of pwd
from above.
#docker-compose.yml
version: "3"
services:
jupyter:
image: "hashrocketsyntax/gorpyter:aug22"
ports:
- "8888:8888"
volumes:
- <PATH_TO_YOUR_NEW_FOLDER>:/usr/local/share/man/user_notebooks
Make sure you are in the same directory as the .yml file and run it like so.
$ docker-compose up
From the console output, grab the URL that looks like this http://127.0.0.1:8888/?token=<YOUR_TOKEN>
and paste it into a browser.
2. JupyterLab Notebooks
PYTHON PACKAGE
pip install gorpyter --upgrade
conda install
will not work as this package has not been published to conda-forge.- Latest version number can be seen here
https://pypi.org/project/gorpyter
, as compared to output ofpip show gorpyter
. - Installing gorpyter will also install these dependencies: rpy2>=3.0.5, tzlocal>=2.0.0, pandas>=0.25.0, numpy>=1.17.0.
GOR QUERY LANGUAGE
TUTORIAL NOTEBOOK
The Docker environment includes this notebook as a tutorial which is available as a gist http://bit.ly/gp-ipynb
.
You'll notice that the notebooks you create in the JupyterLab web interface will be saved to the folder we created on your desktop.
#jupyter_notebook.ipynb
pip install gorpyter --upgrade
import gorpyter as gp
gp.setup()
"""
CHECKLIST
=============================================
✓ -- The version of your Jupyter Python environment is '3.7.3'.
✓ -- The path of the Jupyter R enviroment being accessed by `rpy2` is '/opt/conda/lib/R'.
✓ -- The Python dependencies of `gorpyter` are installed.
✓ -- The `tidyverse` R library is installed in your R environment.
✓ -- The `gorr` R library is installed in your R environment.
✓ -- Python was able to successfully load `gorr` as a module via `rpy2`.
=============================================
"""
project = "<YOUR_PROJECT_NAME>"
api_key = "<YOUR_API_KEY>"
conn = gp.connect(project=project, api_key=api_key)
gp.query(query="<YOUR_GOR_QUERY>", conn=conn)
"""
nor example -- "nor ./"
gor example -- "gor -p chr10 #dbsnp# | TOP 100"
Tested successfully on a 1,000,000 row result.
Despite being run in Python, interupting the client's execution
of this function in `ctrl+c` manner is surprisingly still gracefully
intercepted by the gorr R library, and thus the server-side
execution of the query is simultaneously cleaned up.
"""
3. Optional -- Customizing the Docker Image
In order to create your own Docker image based on jupyter/datascience-notebook:latest
, follow these instructions.
With these files in the same directory:
- Dockerfile
- gorpyter_tutorial.ipynb
Run docker build -t your-image-name:your-new-tag .
from within that directory.
Here are the commands contained in the Dockerfile.
#Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER layne sadler <lsadler@wuxinextcode.com>
# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes
# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
RUN Rscript -e "install.packages('https://cdn.nextcode.com/public/libraries/gorr_0.2.5.tar.gz', repos = NULL, type = 'source')"
ENV R_HOME=/opt/conda/lib/R
# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY gorpyter_tutorial.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man
# ====== SUDO ======
USER root
# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.