A small example package for transcripting DNA to RNA
Project description
Bioinformatics-102
A friendly introduction to the Docker technologies. For more details about this technology, please visit the official website. In a previous repository we learnt how to dockerize a project. Now, we are about to create a python HTTP API, package it into a container image, upload the container image to Google Cloud services, and then deploy the container image to Google Cloud Services' Cloud Run to publish this project.
Pre-requisites
To follow this guidelines please install in your local environment Docker Desktop.
Abstract
This repository includes a dummy bioinformatics tool written in Python, called dna2rna, which transcripts an input string of DNA into a RNAm string:
DNA sequence -> dna2rna -> RNAm sequence
Details
The source code is available inside main.py:
BEGIN
1. Create a string called <RNA SEQ> with the same length as the input one <DNA SEQ>
2. For each character <CURRENT> inside <DNA SEQ>:
3. Assign the matching value for <CURRENT> inside <OPPOSITE> // A <-> T, C <-> G
4. Put <OPPOSITE> inside <RNA_SEQ> at the same position that <CURRENT>
5. Return <RNA SEQ>
END
Examples:
- Input
A
-> OutputT
- Input
aA
-> OutputtT
- Input
ABCD
-> OutputT?G?
More information about the biological transcription process is available here.
Technical aspects
The goal of this example is to create an API to process the DNA sequences and return the RNA transcript sequence. To accomplish this goal, we are using the Flask library to create a simple endpoint at /
to handle POST
requests with a JSON input containing a key named "dnaSequence".
Example
The input must follow this schema:
{
"dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
}
Assuming the API is running on port 8080:
$ curl --location --request POST 'https://localhost:8080' \
--header 'Content-Type: application/json' \
--data-raw '{
"dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
}'
The return will be:
{
"dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac",
"rnaSequence": "ctaggaggtatatgttgccatagaggtggagtccaaatctagagttgttgccttg"
}
Running it
The straightforward alternative is to install the dependencies and run it. However, it implies having a python interpreter installed, or using conda environments...
$ pip install -r requirements.txt
$ gunicorn --bind :8080 --workers 1 --threads 8 --timeout 0 main:app
Docker way
There is a previous repository where we discussed the main concepts and the advantages of creating both Docker images and containers.
Image
In this case, instead of starting from a gcc
image, we are starting from the official lightweight python image:
- FROM -> Basically we define our Docker image from
python:3.9-slim
- ENV -> Setting a environment variable (PYTHONUNBUFFERED=True). More details inside the Dockerfile
- ENV -> Setting a environment variable (APP_HOME) pointing to the working dir path. More details inside the Dockerfile
- WORKDIR -> Create theworking folder
- COPY -> Copy the code from your computer to the image
- RUN -> Install the requirements inside the image by executing
pip install ...
- CMD -> The command to execute when the container is initialized. In this case we start the server
Once you have the Dockerfile it is easy to build the image:
$ docker image build . --tag bioinformatics-102
Details:
docker image build
: Build an image.
: the path to Dockerfile--tag bioinformatics-102
: a tag name to make easier using the image later
Container
Now, we are ready to create a container with the server running from inside it:
$ docker run -p 127.0.0.1:80:8080 --env PORT=8080 bioinformatics-102
Details:
docker run
: Create a container-p 127.0.0.1:8080:8080
: bind the your computer's localhost to the container's port 8080--env PORT=8080
: defines the environment variable with the running port (8080)bioinformatics-102
: the image name
Execute the command to start the server inside the container connected to the machine's 8080 port.
Deploying the instance on a cloud provider
It's quite easy to deploy a container on a cloud provider allowing the community to use your code effortless. For this example, the Docker instance is running on Google Cloud Services. Guidelines for Python projects are available on this official document from Google Cloud.
After following the process mentioned above, we have a public URL to access the API: https://dna2rna-de2u5yatga-ew.a.run.app.
Try it
By writing the following code script you can easily make a POST request to the deployed container:
import requests
import json
url = "https://dna2rna-de2u5yatga-ew.a.run.app"
payload = json.dumps({
"dnaSequence": "gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaac"
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Final thoughts
In addition to what was said in the previous repository, this one wanted to show the pros of using docker with cloud providers. With few steps we have a HTTP API service running online with almost nothing to set up.
I would like to recommend you to give it a chance, dockerize your project, create a simple HTTP API to open your code to the community and deploy it to the cloud. In my case, I'm using the Google Cloud free period to test this code.
Any comment is welcomed here opening an issue or sending me an email to adrian.diaz@vub.be (or diaz.adrian.g@gmail.com)
Thanks for your time and happy coding!!!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file transcriptb2b-1.0.2.tar.gz
.
File metadata
- Download URL: transcriptb2b-1.0.2.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11760d51bdcc68907e6d19bf6ab6792bd42f09426ac48f33a698eb5407bda22c |
|
MD5 | a410d0e148eabf5a436b11091c35bd97 |
|
BLAKE2b-256 | 4af8f832ce0ed8f3a615170527bfc5198dfc2643b22fcbe6d5d961d0c28782a4 |
File details
Details for the file transcriptb2b-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: transcriptb2b-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11c868eb527d85940aa3d336a9dbb568e814f2a8060a891e7ee724b51b60480a |
|
MD5 | fc78601658f9dd61c252f4c4400dc5d4 |
|
BLAKE2b-256 | 8413ae6d28a363d506473670c8a9d21b61fa57dbf1d02d362650af07c1c48296 |