Skip to main content

Tool for joining all the bioinfo-hcpa's variant information retrieval APIs.

Project description

BIOVARS

Summary

Introduction

BIOVARS is a Python API for joining all the other human variant retrival APIs built by the Bioinformatics Core of Hospital de Clínicas de Porto Alegre. With BIOVARS, it is possible to perform variant searches both in gnomAD and ABraOM databases, both that have their personal APIs (Pynoma and PyABraOM). In the future, more databases will be added to BIOVARS aiming to provide a greater level of data heterogeneity in a single centralized solution, which is easy to use and proper to automate complex bioinformatics pipelines. If you have scientific interests or want to use our package in formal reports, we kindly ask you to cite us in your publication: Carneiro, P., Colombelli, F., Recamonde-Mendoza, M., and Matte, U. (2022). Pynoma, PyABraOM and BIOVARS: Towards genetic variant data acquisition and integration. bioRxiv.

Installation

Users can install BIOVARS without the plotting functionalities that require an rpy2/R setup. This is the default installation mode for BIOVARS. Although we have a PyPI package available, for installing the lastest version of the code, users need to clone the GitHub repositories and install them locally:

$ git clone https://github.com/bioinfo-hcpa/pynoma.git
$ git clone https://github.com/bioinfo-hcpa/pyABraOM.git
$ git clone https://github.com/bioinfo-hcpa/biovars.git
$ pip install pynoma
$ pip install pyABraOM
$ pip install biovars

If instead you want to use BIOVARS plotting functionalities, you need to install it with the [plots] extras and then install the required R packages:

$ pip install biovars[plots]
$ biovars --install-r-packages

R installation troubleshooting

  • 'GLIBCXX_3.4.30' not found: see this.
  • .onLoad failed in loadNamespace() for 'tcltk': see this.

Docker

Since rpy2 could be troublesome to properly setup on particular environments, we also provide a Docker container with which users can painlessly run BIOVARS code through. Be aware, though, that we do not provide support for Docker installation and usage, as this is a tool on its own and users should seek guidance on the appropriate technology forums, avoiding posting Docker-related issues here.

Prerequisites

Ensure Docker is properly installed on your machine. Installation guides are available on the Docker official website.

Pulling the Docker Image

Pull the BIOVARS Docker image using the command:

docker pull fecolombelli/biovars:latest

Running Your Python Scripts

When using the Docker container to run Python scripts that generate and save files, such as the plotting functionalities in BIOVARS, it's crucial to manage file paths correctly. Many BIOVARS methods require specifying an output path for saving plots or other data files. Here’s how to ensure that these files are correctly saved to your host system and not just inside the container:

  • Volume Mounting: The command provided mounts your local directory to the container. Use:

    docker run --rm -v /directory/of/your/python/code:/workspace fecolombelli/biovars:latest python3 /workspace/your_script.py
    

    This setup uses the -v option to map /path/to/your/python/code on your host to /workspace inside the container. This mapping allows files created by your script inside the container to be saved directly to your host machine.

  • Specifying Output Paths in Scripts: When your script saves a file, ensure the path in the script points to a location within /workspace, the directory inside the container. For example, if you intend to use the world plot, your script might include something like:

    plt.plot_world("/workspace/output/world_plot_idua.png", 0.01)
    

    Make sure that the corresponding output directory (/path/to/your/python/code/output) exists on your host, or adjust your script to create it if it doesn't:

    import os
    output_dir = '/workspace/output'
    os.makedirs(output_dir, exist_ok=True)
    plt.plot_world("/workspace/output/world_plot_idua.png", 0.01)
    
  • Accessing Output Files: After running your script, any files written to /workspace/output in the container are accessible in /path/to/your/python/code/output on your host machine.

This approach ensures that you can seamlessly run BIOVARS scripts that require file outputs, manage these files from your host system, and maintain the integrity and accessibility of your data. Remember, the paths you use in your scripts for reading and writing need to be consistent with the mounted directories specified in the Docker command.

Searching for variants

The BIOVARS package can perform searches by genes, genome regions or transcripts. However, not all database sources accept the three types of searches, so a Sources object need to be created in order for this validation to occur. Currently there are only two databases, but in the future more will be added.

The Sources class expects as parameters:

  • ref_genome_version (str): the reference genome version (either "GRCh37/hg19" or "GRCh38/hg38")
  • gnomad (bool): whether to search on gnomad database
  • abraom (bool): whether to search on abraom database
  • verbose (bool): whether to log validation messages

The Search class excpects as parameters:

  • sources (biovars.Sources): the initialized Sources object
  • verbose (bool): whether to log searching status messages
from biovars import Sources, Search
src = Sources(ref_genome_version="hg38", gnomad=True, abraom=True)
sch = Search(src, verbose=True)

Search by genes

The gene_search method expects as parameter a list of genes (list[str]): the list of gene symbols of interest.

genes = ["IDUA", "ACE2", "BRCA1"]
df = sch.gene_search(genes)

Search by regions

The region_search method expects as parameter a list of genome regions (list[str]): each item composed of "chromosome-start_region-end_region".

regions = ["4-987010-1001021", "X-15561033-15602100"]
df = sch.region_search(regions)

Search by transcripts

The transcript_search method expects as parameter a list of transcripts (list[str]): the list of ensembl transcript ids of interest.

transcripts = ["ENST00000252519", "ENST00000369985"]
df = sch.transcript_search(transcripts)

Plotting the results

BIOVARS offers plotting methods coded in R (interfaced by rpy2) for summarizing the searches results made with the package. For using any of the plotting methods, the Plotter class needs to be initialized in an object, giving as input a BIOVARS resulting datframe and the genome version used in the searches that generated the data.

Plotter(dataframe: pd.DataFrame, genome_version: str = "hg38")

  • dataframe: the pandas dataframe containing the resulting BIOVARS search.
  • genome_version: either "hg38" or "hg37".
from biovars import Plotter
plt = Plotter(df, "hg38")

Plotter.plot_world(saving_path: str, frequency: float = 0.01)
Plots the world map with the population variants count in terms of private, common and total.

  • saving_path: the path where the file is to be saved
  • frequency: an allele frequency thereshold to be considered among populations to be counted as "present" in that population
plt.plot_world("/home/user/path/", 0.01)

Plotter.plot_variants_grid(saving_path: str, frequency: float = 0.01)
Plots only a grid with the population variants count in terms of private, common and total. It is the same as the plot_world, but only with the bar plots.

  • saving_path: the path where the file is to be saved
  • frequency: an allele frequency thereshold to be considered among populations to be counted as "present" in that population
plt.plot_variants_grid("/home/user/path/", 0.01)

Plotter.plot_genomic_region(saving_path: str, starting_region: int, ending_region: int, mut: bool = False, transcript_region: bool = True)
Plots the genomic region whithin the specified start and end range (max. of 54bp) with the transcripts and where each one falls, as well as the frequency of each type of variant found in the dataframe along the specified region. This region must be contained inside the Potter dataframe.

  • saving_path: the path where the file is to be saved
  • starting_region: where the region of interest starts (must be present in the Plotter input dataframe)
  • ending_region: where the region of interest ends (must be present in the Plotter input dataframe)
  • mut: allele frequnecy(mut=False) or variant annotation (mut=True) to be indicated in the plots
  • transcript_region: where variants fall inside transcripts to be generated by showing all transcript lenght region or only the region where they fall.
plt.plot_genomic_region("/home/user/path/filename.extension", 987027, 987068, False, True)

Plotter.plot_summary(saving_directory: str, gene: str, starting_region: int, ending_region: int , frequency: float = 0.01)
Generates an HTML file containing the above plots and a table with the Search resulting dataframe to more easily visualize this information along with the plots.

  • saving_path: the path where the file is to be saved
  • gene: among the genes inside the Plotter input dataframe, which one is to be used
  • starting_region: where the region of interest starts (it must be present in the Plotter input dataframe)
  • ending_region: where the region of interest ends (it must be present in the Plotter input dataframe)
  • frequency: an allele frequency thereshold to be considered among populations to be counted as "present" in that population
plt.plot_summary("/home/user/path/", "idua", 987027, 987068, 0.01)

Plotting data from Pynoma and PyABraOM

If the user wants to use the Plotter functionalities on data generated by Pynoma or PyABraOM APIs, they first need to convert this data to the BIOVARS format and then utilize the desired Plotter methods. For converting the dataframes, the Search.integrate_data() method can be used after directly modifying the Search.resulting_dataframes attribute, which is a dictionary containing two keys: "gnomad" and "abraom". The following example illustrates how to do that supposing the pynoma_df variable as the resulting dataframe from a Pynoma search.

from biovars import Search, Sources
src = Sources(ref_genome_version="hg38", gnomad=True, abraom=False)
sch = Search(src, verbose=True)

sch.resulting_dataframes["gnomad"] = pynoma_df
biovars_df = sch.integrate_data()

BibTeX entry

@article {Carneiro2022.06.07.495190,
	author = {Carneiro, Paola and Colombelli, Felipe and Recamonde-Mendoza, Mariana and Matte, Ursula},
	title = {Pynoma, PyABraOM and BIOVARS: Towards genetic variant data acquisition and integration},
	elocation-id = {2022.06.07.495190},
	year = {2022},
	doi = {10.1101/2022.06.07.495190},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/06/09/2022.06.07.495190},
	eprint = {https://www.biorxiv.org/content/early/2022/06/09/2022.06.07.495190.full.pdf},
	journal = {bioRxiv}
}

Acknowledgement

This research was supported by the National Council for Scientific and Technological Development (CNPq) and the Research Incentive Fund (FIPE) from Hospital de Clínicas de Porto Alegre.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biovars-0.1.1.tar.gz (106.0 kB view details)

Uploaded Source

Built Distribution

biovars-0.1.1-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file biovars-0.1.1.tar.gz.

File metadata

  • Download URL: biovars-0.1.1.tar.gz
  • Upload date:
  • Size: 106.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for biovars-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5a99a2d785e678448ed97ab0f4386c9f4b05bb23f4c19f2010b1425f297104d0
MD5 708b40d1e9ed9670db89eb0a2cb83b01
BLAKE2b-256 01a1c4af2cbd3ee3726c961faf9c4fad617f81a525dbfd6dd4faa4f17cad0405

See more details on using hashes here.

File details

Details for the file biovars-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: biovars-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for biovars-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ac9166c1b855926bfd1998677f94922c93c9d36d62a45f7a6bbd66e98e5c0618
MD5 8ccaee52d4ed2a0c302b9db0767de81e
BLAKE2b-256 b56a39a5fab2ae9d3d400c76a0d306cbc3c9eabb4d7a07b7a5131aca9c38504d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page