Skip to main content

A small example package for transcripting DNA to RNA

Project description

Bioinformatics-102

A friendly introduction to the Docker technologies. For more details about this technology, please visit the official website. In a previous repository we learnt how to dockerize a project. Now, we are about to create a python HTTP API, package it into a container image, upload the container image to Google Cloud services, and then deploy the container image to Google Cloud Services' Cloud Run to publish this project.

Pre-requisites

To follow this guidelines please install in your local environment Docker Desktop.

Abstract

This repository includes a dummy bioinformatics tool written in Python, called dna2rna, which transcripts an input string of DNA into a RNAm string:

DNA sequence -> dna2rna -> RNAm sequence

Details

The source code is available inside main.py:

BEGIN
  1. Create a string called <RNA SEQ> with the same length as the input one <DNA SEQ>
  2. For each character <CURRENT> inside <DNA SEQ>:
  3.   Assign the matching value for <CURRENT> inside <OPPOSITE> // A <-> T, C <-> G
  4.   Put <OPPOSITE> inside <RNA_SEQ> at the same position that <CURRENT>
  5. Return <RNA SEQ>
END

Examples:

  • Input A -> Output T
  • Input aA -> Output tT
  • Input ABCD -> Output T?G?

More information about the biological transcription process is available here.

Technical aspects

The goal of this example is to create an API to process the DNA sequences and return the RNA transcript sequence. To accomplish this goal, we are using the Flask library to create a simple endpoint at / to handle POST requests with a JSON input containing a key named "dnaSequence".

Example

The input must follow this schema:

{
  "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
}

Assuming the API is running on port 8080:

$ curl --location --request POST 'https://localhost:8080' \
  --header 'Content-Type: application/json' \
  --data-raw '{
      "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
  }'

The return will be:

{
  "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac",
  "rnaSequence": "ctaggaggtatatgttgccatagaggtggagtccaaatctagagttgttgccttg"
}

Running it

The straightforward alternative is to install the dependencies and run it. However, it implies having a python interpreter installed, or using conda environments...

$ pip install -r requirements.txt
$ gunicorn --bind :8080 --workers 1 --threads 8 --timeout 0 main:app

Docker way

There is a previous repository where we discussed the main concepts and the advantages of creating both Docker images and containers.

Image

In this case, instead of starting from a gcc image, we are starting from the official lightweight python image:

  1. FROM -> Basically we define our Docker image from python:3.9-slim
  2. ENV -> Setting a environment variable (PYTHONUNBUFFERED=True). More details inside the Dockerfile
  3. ENV -> Setting a environment variable (APP_HOME) pointing to the working dir path. More details inside the Dockerfile
  4. WORKDIR -> Create theworking folder
  5. COPY -> Copy the code from your computer to the image
  6. RUN -> Install the requirements inside the image by executing pip install ...
  7. CMD -> The command to execute when the container is initialized. In this case we start the server

Once you have the Dockerfile it is easy to build the image:

$ docker image build . --tag bioinformatics-102

Details:

  • docker image build: Build an image
  • .: the path to Dockerfile
  • --tag bioinformatics-102: a tag name to make easier using the image later

Container

Now, we are ready to create a container with the server running from inside it:

$ docker run -p 127.0.0.1:80:8080 --env PORT=8080 bioinformatics-102

Details:

  • docker run: Create a container
  • -p 127.0.0.1:8080:8080: bind the your computer's localhost to the container's port 8080
  • --env PORT=8080: defines the environment variable with the running port (8080)
  • bioinformatics-102: the image name

Execute the command to start the server inside the container connected to the machine's 8080 port.

Deploying the instance on a cloud provider

It's quite easy to deploy a container on a cloud provider allowing the community to use your code effortless. For this example, the Docker instance is running on Google Cloud Services. Guidelines for Python projects are available on this official document from Google Cloud.

After following the process mentioned above, we have a public URL to access the API: https://dna2rna-de2u5yatga-ew.a.run.app.

Try it

By writing the following code script you can easily make a POST request to the deployed container:

import requests
import json

url = "https://dna2rna-de2u5yatga-ew.a.run.app"

payload = json.dumps({
  "dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Final thoughts

In addition to what was said in the previous repository, this one wanted to show the pros of using docker with cloud providers. With few steps we have a HTTP API service running online with almost nothing to set up.

I would like to recommend you to give it a chance, dockerize your project, create a simple HTTP API to open your code to the community and deploy it to the cloud. In my case, I'm using the Google Cloud free period to test this code.

Any comment is welcomed here opening an issue or sending me an email to adrian.diaz@vub.be (or diaz.adrian.g@gmail.com)

Thanks for your time and happy coding!!!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcriptb2b-1.0.2.tar.gz (7.7 kB view hashes)

Uploaded Source

Built Distribution

transcriptb2b-1.0.2-py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page