Skip to main content

Commandline tool for parsing NGS reads by multiple fuzzy regex operations

Project description

itermae 0.4.2

Command-line utility to apply a series of fuzzy regular expression operations to sequences from a variety of formats, then reconstruct a variety of output formats from the captured groups -- after applying custom filters on matched group position, length, sequence, and/or quality statistics. Reads and makes FASTQ, FASTA, text-file, and SAM (tab-delimited). Designed to function with sequence piped in from tools like GNU parallel to permit light-weight parallelization. Matching is handled as strings in regex, and Biopython is used to represent, slice, and read/output formats.

Availability, installation, 'installation'

Options:

  1. Use pip to install itermae, so

    python3 -m pip install itermae

  2. You can clone this repo, and install it locally. Dependencies are in requirements.txt, so python3 -m pip install -r requirements.txt will install those. But if you're not using pip anyways, then you... do you.

  3. You can use Singularity to pull and run a Singularity image of itermae.py, where everything is already installed. This is the recommended usage. This image is built with a few other tools, like gawk, perl, and parallel, to make command line munging easier.

Usage

itermae is envisioned to be used in a pipe-line where you just got your FASTQ reads back, and you want to parse them. You can use zcat to feed small chunks into the tool, develop operations that match, filter, and extract the right groups to assemble the output you want. Then you wrap it it up behind parallel and feed the whole FASTQ file via zcat in on standard in. This parallelizes with a small memory footprint (tune the chunk size), then you write it out to disk (or stream into another tool).

Tutorial / demo - there's a jupyter notebook in this root directory (demos_and_tutorial_itermae.ipynb) and the rendered output HTML. That should have some examples and ideas for how to use it. There's also some longer runs that are launched by a bash script in profiling_tests, these generate longer runs for profiling purposes with cProfile and snakeviz.

Designed for use in command-line shells on a *nix machine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itermae-0.4.2.tar.gz (13.5 kB view hashes)

Uploaded Source

Built Distribution

itermae-0.4.2-py3-none-any.whl (14.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page