Bioinformatics tool outputs converter to JSON or YAML
Project description
crimson
crimson
converts non-standard bioinformatics tool outputs to JSON or YAML.
Currently it can convert outputs of the following tools:
- FastQC (
fastqc
) - FusionCatcher (
fusioncatcher
) - samtools flagstat (
flagstat
) - Picard metrics tools (
picard
) - STAR log file (
star
) - STAR-Fusion hits table (
star-fusion
) - Variant Effect Predictor
plain text output (
vep
)
The conversion can be done using the command line interface or by calling the tool-specificparser functions in your Python script.
Installation
crimson
is available on the Python Package Index
and you can install it via pip
:
$ pip install crimson
It is also available on
BioConda, both through the
conda
package manager or as a
Docker container.
Usage
As a command line tool
The general command is crimson {program_name}
and by default the output is written to
stdout
. For example, to use the picard
parser, you would execute:
$ crimson picard /path/to/a/picard.metrics
You can also specify a file name directly to write to a file. The following command will
write the output to a file named converted.json
:
$ crimson picard /path/to/a/picard.metrics converted.json
Some parsers may also accept additional input format. The FastQC parser, for example, also works if you specify a path to a FastQC output directory:
$ crimson fastqc /path/to/a/fastqc/dir
or path to a zipped result:
$ crimson fastqc /path/to/a/fastqc_result.zip
When in doubt, use the --help
flag:
$ crimson --help # for the general help
$ crimson fastqc --help # for parser-specific (FastQC) help
As a Python library function
Generally, the function to import is located at crimson.{program_name}.parser
. For
example, to use the picard
parser in your script, you can do:
from crimson import picard
# You can specify the input file name as a string ...
parsed = picard.parse("/path/to/a/picard.metrics")
# ... or a file handle
with open("/path/to/a/picard.metrics") as src:
parsed = picard.parse(src)
Why?
- Not enough tools use standard output formats.
- Writing and re-writing the same parsers across different scripts is not a productive way to spend the day.
Local Development
Setting up a local development requires that you set up all of the supported Python versions. We recommend using pyenv for this.
The following steps can be your guide for your local development setup:
# Clone the repository and cd into it.
$ git clone https://git.sr.ht/~bow/crimson
$ cd crimson
# Create your virtualenv.
# If you already have pyenv installed, you may use the Makefile rule below.
$ make dev-pyenv
# Install the package along with its development dependencies.
$ make dev
# Run the test and linter suite to verify the setup.
$ make lint test
Contributing
If you are interested, crimson
accepts the following types contribution:
- Documentation additions (if anything seems unclear, feel free to open an issue)
- Bug reports
- Support for tools' outputs which can be converted to JSON or YAML.
For any of these, feel free to open an issue in the issue tracker or submit a pull request.
License
crimson
is BSD-licensed. Refer to the LICENSE
file for the full license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.