Skip to main content

A CLI tool for creating images from DNA sequences

Project description

LAGG

Looking at Genomes Graphically (LAGG) is a CLI tool for creating images from DNA sequences.

LAGG is capable of generating an image providing just an SRA (or ENA) accession number and a k-mer count. Of course, the CLI contains more options and even a config-based workflow for more complex processes.

Images are generated using an algorithm based on Chaos Game Representation[^1] (CGR). This process creates images by counting k-mers for a genome / DNA sequence. With genomes aquired from the European Nucleotide Archive (ENA). Options are available to use Cutadapt[^3] to preprocess the genomes before counting.

Installation

LAGG makes use of Jellyfish[^2] as a dependency for k-mer counting. Installation instructions can be found on the GitHub page for Jellyfish found here. Jellyfish is commonly available on major Linux distributions and on Homebrew for MacOS.

After installing dependencies, install LAGG using pip with the following command:

pip install pylagg

Usage

Using LAGG is as simple as executing the lagg command.

For example, to generate an image from an SRA or ENA accession number:

lagg cgr -a <accession> -k <kmer size>

Replace <accession> with any accession number (try ERR4770013 for a small COVID-19 genome)

The <kmer_size> is an integer used when counting kmers which can eventually determine the size of the image. For larger genomes, consider a size of 9-10. For smaller ones, consider 5-8.

For more options or help type 'lagg --help or visit the documentation site here.

For Contributors

This project uses Poetry to handle dependencies and build the project. Installation instructions can be found here.

Install Dependencies

Similar to the CLI, Jellyfish is required to execute k-mer counting for LAGG. Please make sure to have to it installed before continuing. Instructions can be found in the "Installation" section above.

For project dependencies, use poetry install to automatically create a new virtual environment with all required packages.

If you'd like to install the dependencies directly within the project directory, use the following command:

poetry config virtualenvs.in-project true

Running Tests

To run tests, first, activate the virtual environment using poetry shell.

Use pytest to run all tests.

[^1]: H. Joel. Jeffrey, “Chaos game representation of gene structure,” Nucleic Acids Research, vol. 18, no. 8, pp. 2163–2170, 1990, doi: https://doi.org/10.1093/nar/18.8.2163.

[^2]: G. Marçais and C. Kingsford, “A fast, lock-free approach for efficient parallel counting of occurrences of k-mers,” Bioinformatics, vol. 27, no. 6, pp. 764–770, Jan. 2011, doi: https://doi.org/10.1093/bioinformatics/btr011.

[^3]: Martin, Marcel. “Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads.” EMBnet.journal, vol. 17, no. 1, 2 May 2011, p. 10, journal.embnet.org/index.php/embnetjournal/article/view/200, https://doi.org/10.14806/ej.17.1.200.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylagg-0.4.1.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

pylagg-0.4.1-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file pylagg-0.4.1.tar.gz.

File metadata

  • Download URL: pylagg-0.4.1.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Linux/6.6.56

File hashes

Hashes for pylagg-0.4.1.tar.gz
Algorithm Hash digest
SHA256 c362c0f8a8c8fc5e572cb6b75f73ecf62f53edb22fe7c520da0d92db3c6d7567
MD5 4bc96f7b5d7dfb2000afb3e493c09b27
BLAKE2b-256 792a5beff2b7dfe4220c51a3fba7e7285b7ef38307755aa93b54eb6fdeaef629

See more details on using hashes here.

File details

Details for the file pylagg-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: pylagg-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Linux/6.6.56

File hashes

Hashes for pylagg-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87bc32e42dbd4805e9e9e44d96e473638052512ae37fbf933bf4eb5a4be1f1d5
MD5 5d1149508f4611d6be333ef1a36e4ef8
BLAKE2b-256 6d5594a906122de9bc9d3aeb50647b33ccd18e376a601c3b80302e156a5dd72b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page