strandex

Strand-anchored regex for expansion or contraction of FASTQ files

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# strandex
**strand**-anchored reg**ex** for uniform sampling from FASTQ files (think **spandex**)

[![Build Status](https://travis-ci.org/mdshw5/strandex.svg?branch=master)](https://travis-ci.org/mdshw5/strandex)
[![PyPI](https://img.shields.io/pypi/v/strandex.svg?branch=master)](https://pypi.python.org/pypi/strandex)
[![Landscape](https://landscape.io/github/mdshw5/strandex/master/landscape.svg)](https://landscape.io/github/mdshw5/strandex/master)
[![Coveralls](https://coveralls.io/repos/mdshw5/strandex/badge.svg?branch=master)](https://coveralls.io/r/mdshw5/strandex?branch=master)

##Why use this?
- You want only a few reads from a large FASTQ file (**downsampling**)
- You are constrained by I/O so that reading through the entire file is very slow
- You want to avoid sampling only the beginning or end of the file
- You want to expand a small FASTQ file to a specific number of reads (**upsampling**)

# Install

`pip install strandex`

# Examples

```
from strandex import FastqSampler

sampler = FastqSampler('read1.fastq', fastq2='read2.fastq', nreads=100000, seed=42)
for read1, read2 in sampler:
# read1 and read2 are 4-line strings sampled from paired input

sampler = FastqSampler('read1.fastq', nreads=100000, seed=42)
for read1, read2 in sampler:
# read1 is a 4-line string sampled from input
# read2 is NoneType
```
Note that you may sample more reads *than are available in your input file*. In
the event that you want to sample more reads than your input file contains, strandex
will sample the file with replacement, meaning you will get some duplicate reads.

# CLI script

```
usage: strandex [-h] [-fq2 FASTQ2] [-o2 OUT2] [-n NREADS] [-s SEED] fastq1 out

sample uniformly without reading an entire fastq file

positional arguments:
fastq1 input fastq file
out output fastq file

optional arguments:
-h, --help show this help message and exit
-fq2 FASTQ2, --fastq2 FASTQ2
input fastq file read pairs
-o2 OUT2, --out2 OUT2
output fastq file read pairs
-n NREADS, --nreads NREADS
number of reads to sample from input (default: 1)
-s SEED, --seed SEED seed for random number generator (default: None)
```

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.1

May 17, 2017

0.4.0

May 17, 2017

0.3.2

Jun 23, 2016

This version

0.3.1

Jun 23, 2016

0.3

Jun 22, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strandex-0.3.1.tar.gz (5.4 kB view hashes)

Uploaded Jun 23, 2016 Source

Built Distribution

strandex-0.3.1-py2.6.egg (12.4 kB view hashes)

Uploaded Jun 23, 2016 Source

Hashes for strandex-0.3.1.tar.gz

Hashes for strandex-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`fdd79fa3bd59c4b0864dca7d2ac2d061e8110f7a567e63431bca12de4a501f69`
MD5	`c5843285212c24ba89acf2fd4123fb9a`
BLAKE2b-256	`3fbce332107e6e8cfe27bd01e24d41db7a204b2a0648feded82189ead3444ef2`

Hashes for strandex-0.3.1-py2.6.egg

Hashes for strandex-0.3.1-py2.6.egg
Algorithm	Hash digest
SHA256	`6ca3699596385c7c3be734ba6c998c8ff0f116bd75e13499ebbdd19397aa6842`
MD5	`a7215876ef791995b8fb6c6e19516089`
BLAKE2b-256	`2544f86b5de3ca6497cabcc2ae68f29f59d14b50ecc73826a70278d311560357`