Skip to main content

An SQL-based solution for large-scale genomic analysis

Project description

version PyPI downloads Maven Central status Python-3.8 license coverage GitHub contributors GitHub commit activity

pysequila

pysequila is a Python entrypoint to SeQuiLa, an ANSI-SQL compliant solution for efficient sequencing reads processing and genomic intervals querying built on top of Apache Spark. Range joins, depth of coverage and pileup computations are bread and butter for NGS analysis but the high volume of data make them execute very slowly or even failing to compute.

Requirements

  • Python 3.7, 3.8, 3.9

Features

  • custom data sources for bioinformatics file formats (BAM, CRAM, VCF)

  • depth of coverage calculations

  • pileup calculations

  • reads filtering

  • efficient range joins

  • other utility functions

  • support for both SQL and Dataframe/Dataset API

Setup

$ python -m pip install --user pysequila
or
(venv)$ python -m pip install pysequila

Usage

$ python
>>> from pysequila import SequilaSession
>>> ss = SequilaSession \
  .builder \
  .config("spark.jars.packages", "org.biodatageeks:sequila_2.12:1.1.0") \
  .config("spark.driver.memory", "2g") \
  .getOrCreate()
>>> ss.sql(
      f"""
      CREATE TABLE IF NOT EXISTS reads
      USING org.biodatageeks.sequila.datasources.BAM.BAMDataSource
      OPTIONS(path "/features/data/NA12878.multichrom.md.bam")
      """
>>> ss.sql ("SELECT * FROM  coverage('reads', 'NA12878','/features/data/Homo_sapiens_assembly18_chr1_chrM.small.fasta")
>>> # or using DataFrame/DataSet API
>>> ss.coverage("/features/data/NA12878.multichrom.md.bam", "/features/data/Homo_sapiens_assembly18_chr1_chrM.small.fasta")

ChangeLog

0.1.0 (2020-09-16)

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysequila-0.4.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

pysequila-0.4.1-py2.py3-none-any.whl (8.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pysequila-0.4.1.tar.gz.

File metadata

  • Download URL: pysequila-0.4.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for pysequila-0.4.1.tar.gz
Algorithm Hash digest
SHA256 2b00e79bc9a78ef988b03043396cb37d48e9cad729c9ace24749f07074d53b1d
MD5 290b74193046fe5c9c9b131a76382571
BLAKE2b-256 47c1574af8fd93b78a1a3d41af3286818b071340bca32042205eb9fe58c9a08a

See more details on using hashes here.

File details

Details for the file pysequila-0.4.1-py2.py3-none-any.whl.

File metadata

  • Download URL: pysequila-0.4.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for pysequila-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a7544135af1660fdc8fce2ffdad49e879f220d3f7ee6a10384d96324e2d6ae6b
MD5 0108e4b447f59d8b4ca9353d202faa98
BLAKE2b-256 6fd23ef050409306672dc6bf03580de86cd34bd7901ac7b80d8473eb8cac2cec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page