python API for parsing FastQC output
Project description
# Welcome to fastqcparser
python API for parsing the output of `FastQC <https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>`.
# Installation
1. Recomended way to install is using ``pip``
```
pip install fastqcparser
```
2. Alternatively you can install with ``easy_install``
::
```
easy_install fastqcparser
```
3. You can also install from Github source code.
::
```
cd
git clone http://bitbucket.org/bubioinformaticshub/fastqcparser.git
cd fastqcparser
python setup.py install
```
# Usage/lazy documentation
```python
# import fastqcparser
from pprint import pprint
from fastqcparser import FastQCParser
# load file
f = FastQCParser('/path/to/fastqc_output_file.txt')
# or
f = FastQCParser('/path/to/fastqc.zip')
# or
with open('/path/to/fastqc_data.txt') as fp :
f = FastQCParser(fp)
# or
with FastQCParser('/path/to/fastqc_output_file.txt') as f :
print(f)
# some convenience fields are available from the Basic Statistics module
print('\n'.join([
f.filename,
f.file_type,
f.encoding,
f.total_sequences,
f.filtered_sequences,
f.sequence_length,
f.percent_gc
]))
# the available modules are in f.modules
pprint(list(f.modules.keys()))
#['Basic Statistics',
# 'Per base sequence quality',
# 'Per sequence quality scores',
# 'Per base sequence content',
# 'Per base GC content',
# 'Per sequence GC content',
# 'Per base N content',
# 'Sequence Length Distribution',
# 'Sequence Duplication Levels',
# 'Overrepresented sequences',
# 'Kmer Content']
# you can access an individual module either as a key of f.modules or using
# f itself:
pprint(f.modules['Basic Statistics'])
pprint(f['Basic Statistics'])
# each module contains a dictionary
pprint(f['Basic Statistics'])
#{'addnl': {},
# 'data': [['Filename', 'sample1.fastq'],
# ['File type', 'Conventional base calls'],
# ['Encoding', 'Sanger / Illumina 1.9'],
# ['Total Sequences', 1571332],
# ['Filtered Sequences', 0],
# ['Sequence length', 29],
# ['%GC', 53]],
# 'fieldnames': ['Measure', 'Value'],
# 'name': 'Basic Statistics',
# 'status': 'pass'}
# 'data' contains the tabular data from the module as a list of lists, with
# numerical values cast to ints and floats as appropriate
# 'fieldnames' contains the names of each column in 'data'
# 'name' is the name of the module, same as the key
# 'status' is pass/warn/fail as reported by fastqc
# 'addnl' contains extra fields for some modules
```
python API for parsing the output of `FastQC <https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>`.
# Installation
1. Recomended way to install is using ``pip``
```
pip install fastqcparser
```
2. Alternatively you can install with ``easy_install``
::
```
easy_install fastqcparser
```
3. You can also install from Github source code.
::
```
cd
git clone http://bitbucket.org/bubioinformaticshub/fastqcparser.git
cd fastqcparser
python setup.py install
```
# Usage/lazy documentation
```python
# import fastqcparser
from pprint import pprint
from fastqcparser import FastQCParser
# load file
f = FastQCParser('/path/to/fastqc_output_file.txt')
# or
f = FastQCParser('/path/to/fastqc.zip')
# or
with open('/path/to/fastqc_data.txt') as fp :
f = FastQCParser(fp)
# or
with FastQCParser('/path/to/fastqc_output_file.txt') as f :
print(f)
# some convenience fields are available from the Basic Statistics module
print('\n'.join([
f.filename,
f.file_type,
f.encoding,
f.total_sequences,
f.filtered_sequences,
f.sequence_length,
f.percent_gc
]))
# the available modules are in f.modules
pprint(list(f.modules.keys()))
#['Basic Statistics',
# 'Per base sequence quality',
# 'Per sequence quality scores',
# 'Per base sequence content',
# 'Per base GC content',
# 'Per sequence GC content',
# 'Per base N content',
# 'Sequence Length Distribution',
# 'Sequence Duplication Levels',
# 'Overrepresented sequences',
# 'Kmer Content']
# you can access an individual module either as a key of f.modules or using
# f itself:
pprint(f.modules['Basic Statistics'])
pprint(f['Basic Statistics'])
# each module contains a dictionary
pprint(f['Basic Statistics'])
#{'addnl': {},
# 'data': [['Filename', 'sample1.fastq'],
# ['File type', 'Conventional base calls'],
# ['Encoding', 'Sanger / Illumina 1.9'],
# ['Total Sequences', 1571332],
# ['Filtered Sequences', 0],
# ['Sequence length', 29],
# ['%GC', 53]],
# 'fieldnames': ['Measure', 'Value'],
# 'name': 'Basic Statistics',
# 'status': 'pass'}
# 'data' contains the tabular data from the module as a list of lists, with
# numerical values cast to ints and floats as appropriate
# 'fieldnames' contains the names of each column in 'data'
# 'name' is the name of the module, same as the key
# 'status' is pass/warn/fail as reported by fastqc
# 'addnl' contains extra fields for some modules
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fastqcparser-1.1.tar.gz
(5.4 kB
view details)
Built Distribution
File details
Details for the file fastqcparser-1.1.tar.gz
.
File metadata
- Download URL: fastqcparser-1.1.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2895ca6b4a805203d370d07af34ad3d8136e7432f1af27e990ea16963feb156d |
|
MD5 | 95a5ebefcf6892c884769897229e55d5 |
|
BLAKE2b-256 | f1768187aafcd9a2b52691fbf8e080c1133c9645f5821221035c88331163fe55 |
File details
Details for the file fastqcparser-1.1-py3-none-any.whl
.
File metadata
- Download URL: fastqcparser-1.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5dbb6ac41b867531252cf76f1a0907278b110b37f2337145a4dfe050ea86739 |
|
MD5 | 9f8a077db589f654e8ac0f155bcf8a2f |
|
BLAKE2b-256 | 53e03bb648051232434a53f4de5587438546c8fa3e5b615f5c84299af9046439 |