ETL pipeline for single-cell RNA-seq data
Project description
scp-ingest-pipeline
File Ingest Pipeline for Single Cell Portal
The SCP Ingest Pipeline is an ETL pipeline for single-cell RNA-seq data.
Prerequisites
- Python 3.7+
- Google Cloud Platform project
- Suitable service account (SA) and MongoDB VM in GCP. SA needs roles "Editor", "Genomics Pipelines Runner", and "Storage Object Admin". Broad Institute engineers: see instructions here.
- SAMTools, if using
ingest/make_toy_data.py
Install
Fetch the code, boot your virtualenv, install dependencies:
git clone git@github.com:broadinstitute/scp-ingest-pipeline.git
cd scp-ingest-pipeline
python3 -m venv env --copies
source env/bin/activate
pip install -r requirements.txt
And if using ingest/make_toy_data.py:
brew install samtools
Now get secrets from Vault to set environment variables needed to write to the database:
export BROAD_USER="<username in your email address>"
export DATABASE_NAME="single_cell_portal_development"
vault login -method=github token=`~/bin/git-vault-token`
# Get username and password
vault read secret/kdux/scp/development/$BROAD_USER/mongo/user
export MONGODB_USERNAME="<username from Vault>"
export MONGODB_PASSWORD="<password from Vault>"
# Get external IP address for host
vault read secret/kdux/scp/development/$BROAD_USER/mongo/hostname
export DATABASE_HOST="<ip from Vault (omit brackets)>"
Git hooks
After installing Ingest Pipeline, add Git hooks to help ensure code quality:
pre-commit install && pre-commit install -t pre-push
The hooks will expect that git-secrets has been set up. If you are a Broad Institute employee who has not done this yet, please see: broadinstitute/single_cell_portal_configs for specific guidance.
Bypass hooks
In rare cases, you might need to skip Git hooks, like so:
- Skip commit hooks:
git commit ... --no-verify - Skip pre-push hooks:
git push ... --no-verify
Test
After installing:
source env/bin/activate
cd tests; pytest
Use
Run this every time you start a new terminal to work on this project:
source env/bin/activate
See ingest_pipeline.py for usage examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scp-ingest-pipeline-1.3.7.tar.gz.
File metadata
- Download URL: scp-ingest-pipeline-1.3.7.tar.gz
- Upload date:
- Size: 57.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d486f757a479cd596718d60fe2ec14096726750c307e9fe881563b0af53e9ac6
|
|
| MD5 |
a65f1f3b94dd9f39476447a42d95560a
|
|
| BLAKE2b-256 |
e2555216a3c603090ea70e8db4653815d111ba41ae868a6c44379886f23e1e9d
|
File details
Details for the file scp_ingest_pipeline-1.3.7-py3-none-any.whl.
File metadata
- Download URL: scp_ingest_pipeline-1.3.7-py3-none-any.whl
- Upload date:
- Size: 69.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30ac1048791738cea023fff7e8c1731b73aa7780779a77de68a1b64f9e4b8955
|
|
| MD5 |
8780d83e27d86ee7228f3eb4a2e83861
|
|
| BLAKE2b-256 |
9c4137480067db0be5156c8eceb556d48470cef777309d47ce1e85353ffcb6d0
|