Skip to main content

A small example package for transcripting DNA to RNA

Project description

Bioinformatics-102

A friendly introduction to the Docker technologies. For more details about this technology, please visit the official website. In a previous repository we learnt how to dockerize a project. Now, we are about to create a python HTTP API, package it into a container image, upload the container image to Google Cloud services, and then deploy the container image to Google Cloud Services' Cloud Run to publish this project.

Pre-requisites

To follow this guidelines please install in your local environment Docker Desktop.

Abstract

This repository includes a dummy bioinformatics tool written in Python, called dna2rna, which transcripts an input string of DNA into a RNAm string:

DNA sequence -> dna2rna -> RNAm sequence

Details

The source code is available inside main.py:

BEGIN
  1. Create a string called <RNA SEQ> with the same length as the input one <DNA SEQ>
  2. For each character <CURRENT> inside <DNA SEQ>:
  3.   Assign the matching value for <CURRENT> inside <OPPOSITE> // A <-> T, C <-> G
  4.   Put <OPPOSITE> inside <RNA_SEQ> at the same position that <CURRENT>
  5. Return <RNA SEQ>
END

Examples:

  • Input A -> Output T
  • Input aA -> Output tT
  • Input ABCD -> Output T?G?

More information about the biological transcription process is available here.

Technical aspects

The goal of this example is to create an API to process the DNA sequences and return the RNA transcript sequence. To accomplish this goal, we are using the Flask library to create a simple endpoint at / to handle POST requests with a JSON input containing a key named "dnaSequence".

Example

The input must follow this schema:

{
  "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
}

Assuming the API is running on port 8080:

$ curl --location --request POST 'https://localhost:8080' \
  --header 'Content-Type: application/json' \
  --data-raw '{
      "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
  }'

The return will be:

{
  "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac",
  "rnaSequence": "ctaggaggtatatgttgccatagaggtggagtccaaatctagagttgttgccttg"
}

Running it

The straightforward alternative is to install the dependencies and run it. However, it implies having a python interpreter installed, or using conda environments...

$ pip install -r requirements.txt
$ gunicorn --bind :8080 --workers 1 --threads 8 --timeout 0 main:app

Docker way

There is a previous repository where we discussed the main concepts and the advantages of creating both Docker images and containers.

Image

In this case, instead of starting from a gcc image, we are starting from the official lightweight python image:

  1. FROM -> Basically we define our Docker image from python:3.9-slim
  2. ENV -> Setting a environment variable (PYTHONUNBUFFERED=True). More details inside the Dockerfile
  3. ENV -> Setting a environment variable (APP_HOME) pointing to the working dir path. More details inside the Dockerfile
  4. WORKDIR -> Create theworking folder
  5. COPY -> Copy the code from your computer to the image
  6. RUN -> Install the requirements inside the image by executing pip install ...
  7. CMD -> The command to execute when the container is initialized. In this case we start the server

Once you have the Dockerfile it is easy to build the image:

$ docker image build . --tag bioinformatics-102

Details:

  • docker image build: Build an image
  • .: the path to Dockerfile
  • --tag bioinformatics-102: a tag name to make easier using the image later

Container

Now, we are ready to create a container with the server running from inside it:

$ docker run -p 127.0.0.1:80:8080 --env PORT=8080 bioinformatics-102

Details:

  • docker run: Create a container
  • -p 127.0.0.1:8080:8080: bind the your computer's localhost to the container's port 8080
  • --env PORT=8080: defines the environment variable with the running port (8080)
  • bioinformatics-102: the image name

Execute the command to start the server inside the container connected to the machine's 8080 port.

Deploying the instance on a cloud provider

It's quite easy to deploy a container on a cloud provider allowing the community to use your code effortless. For this example, the Docker instance is running on Google Cloud Services. Guidelines for Python projects are available on this official document from Google Cloud.

After following the process mentioned above, we have a public URL to access the API: https://dna2rna-de2u5yatga-ew.a.run.app.

Try it

By writing the following code script you can easily make a POST request to the deployed container:

import requests
import json

url = "https://dna2rna-de2u5yatga-ew.a.run.app"

payload = json.dumps({
  "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Final thoughts

In addition to what was said in the previous repository, this one wanted to show the pros of using docker with cloud providers. With few steps we have a HTTP API service running online with almost nothing to set up.

I would like to recommend you to give it a chance, dockerize your project, create a simple HTTP API to open your code to the community and deploy it to the cloud. In my case, I'm using the Google Cloud free period to test this code.

Any comment is welcomed here opening an issue or sending me an email to adrian.diaz@vub.be (or diaz.adrian.g@gmail.com)

Thanks for your time and happy coding!!!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcriptb2b-1.0.2.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

transcriptb2b-1.0.2-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file transcriptb2b-1.0.2.tar.gz.

File metadata

  • Download URL: transcriptb2b-1.0.2.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for transcriptb2b-1.0.2.tar.gz
Algorithm Hash digest
SHA256 11760d51bdcc68907e6d19bf6ab6792bd42f09426ac48f33a698eb5407bda22c
MD5 a410d0e148eabf5a436b11091c35bd97
BLAKE2b-256 4af8f832ce0ed8f3a615170527bfc5198dfc2643b22fcbe6d5d961d0c28782a4

See more details on using hashes here.

File details

Details for the file transcriptb2b-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: transcriptb2b-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for transcriptb2b-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11c868eb527d85940aa3d336a9dbb568e814f2a8060a891e7ee724b51b60480a
MD5 fc78601658f9dd61c252f4c4400dc5d4
BLAKE2b-256 8413ae6d28a363d506473670c8a9d21b61fa57dbf1d02d362650af07c1c48296

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page