Skip to main content

ETL pipeline for single-cell RNA-seq data

Project description

scp-ingest-pipeline

File Ingest Pipeline for Single Cell Portal

Build status Code coverage

The SCP Ingest Pipeline is an ETL pipeline for single-cell RNA-seq data.

Prerequisites

  • Python 3.7+
  • Google Cloud Platform project
  • Suitable service account (SA) and MongoDB VM in GCP. SA needs roles "Editor", "Genomics Pipelines Runner", and "Storage Object Admin". Broad Institute engineers: see instructions here.
  • SAMTools, if using ingest/make_toy_data.py

Install

Fetch the code, boot your virtualenv, install dependencies:

git clone git@github.com:broadinstitute/scp-ingest-pipeline.git
cd scp-ingest-pipeline
python3 -m venv env --copies
source env/bin/activate
pip install -r requirements.txt

And if using ingest/make_toy_data.py:

brew install samtools

Now get secrets from Vault to set environment variables needed to write to the database:

export BROAD_USER="<username in your email address>"

export DATABASE_NAME="single_cell_portal_development"

vault login -method=github token=`~/bin/git-vault-token`

# Get username and password
vault read secret/kdux/scp/development/$BROAD_USER/mongo/user

export MONGODB_USERNAME="<username from Vault>"
export MONGODB_PASSWORD="<password from Vault>"

# Get external IP address for host
vault read secret/kdux/scp/development/$BROAD_USER/mongo/hostname

export DATABASE_HOST="<ip from Vault (omit brackets)>"

Git hooks

After installing Ingest Pipeline, add Git hooks to help ensure code quality:

pre-commit install && pre-commit install -t pre-push

The hooks will expect that git-secrets has been set up. If you are a Broad Institute employee who has not done this yet, please see: broadinstitute/single_cell_portal_configs for specific guidance.

Bypass hooks

In rare cases, you might need to skip Git hooks, like so:

  • Skip commit hooks: git commit ... --no-verify
  • Skip pre-push hooks: git push ... --no-verify

Test

After installing:

source env/bin/activate
cd tests; pytest

Use

Run this every time you start a new terminal to work on this project:

source env/bin/activate

See ingest_pipeline.py for usage examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scp-ingest-pipeline-1.3.7.tar.gz (57.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scp_ingest_pipeline-1.3.7-py3-none-any.whl (69.1 kB view details)

Uploaded Python 3

File details

Details for the file scp-ingest-pipeline-1.3.7.tar.gz.

File metadata

  • Download URL: scp-ingest-pipeline-1.3.7.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.4

File hashes

Hashes for scp-ingest-pipeline-1.3.7.tar.gz
Algorithm Hash digest
SHA256 d486f757a479cd596718d60fe2ec14096726750c307e9fe881563b0af53e9ac6
MD5 a65f1f3b94dd9f39476447a42d95560a
BLAKE2b-256 e2555216a3c603090ea70e8db4653815d111ba41ae868a6c44379886f23e1e9d

See more details on using hashes here.

File details

Details for the file scp_ingest_pipeline-1.3.7-py3-none-any.whl.

File metadata

  • Download URL: scp_ingest_pipeline-1.3.7-py3-none-any.whl
  • Upload date:
  • Size: 69.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.4

File hashes

Hashes for scp_ingest_pipeline-1.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 30ac1048791738cea023fff7e8c1731b73aa7780779a77de68a1b64f9e4b8955
MD5 8780d83e27d86ee7228f3eb4a2e83861
BLAKE2b-256 9c4137480067db0be5156c8eceb556d48470cef777309d47ce1e85353ffcb6d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page