Skip to main content

An SQL-based solution for large-scale genomic analysis

Project description

version PyPI downloads Maven Central status Python-3.8 license coverage GitHub contributors GitHub commit activity

pysequila

pysequila is Python entrypoint to SeQuiLa, an ANSI-SQL compliant solution for efficient sequencing reads processing and genomic intervals querying built on top of Apache Spark. Range joins, depth of coverage and pileup computations are bread and butter for NGS analysis but the high volume of data make them execute very slowly or even failing to compute.

Requirements

  • Python 3.7, 3.8

Features

  • custom data sources for bioinformatics file formats (BAM, CRAM, VCF)

  • depth of coverage calculations

  • pileup calculations

  • reads filtering

  • efficient range joins

  • other utility functions

  • support for both SQL and Dataframe/Dataset API

Setup

$ python -m pip install --user pysequila
or
(venv)$ python -m pip install pysequila

Usage

$ python
>>> from pysequila import SequilaSession
>>> ss = SequilaSession \
  .builder \
  .config("spark.driver.memory", "2g") \
  .getOrCreate()
>>> ss.sql ("SELECT * FROM  coverage('reads', 'NA12878'")
>>>

ChangeLog

0.1.0 (2020-09-16)

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysequila-0.3.1.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

pysequila-0.3.1-py2.py3-none-any.whl (8.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pysequila-0.3.1.tar.gz.

File metadata

  • Download URL: pysequila-0.3.1.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for pysequila-0.3.1.tar.gz
Algorithm Hash digest
SHA256 5d0641d065b262680b2a0f9d63de2b4fe416d7947ebd52450605c0ca01ad1e74
MD5 2b5212f299694f6205129ff911489e31
BLAKE2b-256 ac4259f65aad5b1b94ac6107e0e0c520f9173ad8bde7512bb8fbf80bab341393

See more details on using hashes here.

File details

Details for the file pysequila-0.3.1-py2.py3-none-any.whl.

File metadata

  • Download URL: pysequila-0.3.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for pysequila-0.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e89b6494255067d5b67c12148774dd4b0ff7d6f2540fc9e2a8aff10049b459b6
MD5 b97d78d01ac0e7cece498cdcdeb7f15f
BLAKE2b-256 6bed27071aedea93beade4b07f6347d2be39cebbc429750d3ab4b9fa81a7468c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page