Skip to main content

Compress fastq files with spring and check the integrity

Project description

License: MIT Build Status - GitHub codecov CodeFactor

Crunchy

A python wrapper around spring and cram (samtools) to compress fastq to spring and bam to cram. When compressing fastqs to spring an integrity check can be performed by using flag: crunchy compress spring --spring-path <springfile> --first <read_1.fastq> --second <read_2.fastq> --check-integrity

Install

Pip

pip install crunchy

Docker

This will install crunchy as well as samtools and spring within the container.

docker pull clinicalgenomics/crunchy:0.5

Run crunchy using:

docker run clinicalgenomics/crunchy:0.5 crunchy

Developers

git clone https://github.com/Clinical-Genomics/crunchy
pip install -e .
crunchy --help
Usage: crunchy [OPTIONS] COMMAND [ARGS]...

  Base command for crunchy

                .---. .---.
               :     : o   :    me want cookie!
           _..-:   o :     :-.._    /
       .-''  '  `---' `---' "   ``-.
     .'   "   '  "  .    "  . '  "  `.
    :   '.---.,,.,...,.,.,.,..---.  ' ;
    `. " `.                     .' " .'
     `.  '`.                   .' ' .'
      `.    `-._           _.-' "  .'  .----.
        `. "    '"--...--"'  . ' .'  .'  o   `.
        .'`-._'    " .     " _.-'`. :       o  :
      .'      ```--.....--'''    ' `:_ o       :
    .'    "     '         "     "   ; `.;";";";'
   ;         '       "       '     . ; .' ; ; ;
  ;     '         '       '   "    .'      .-'
  '  "     "   '      "           "    _.-'

Options:
  --spring-binary TEXT            Path to spring binary  [default: spring]
  --samtools-binary TEXT          Path to spring binary  [default: samtools]
  -t, --threads INTEGER           Number of threads to use for spring
                                  compression  [default: 8]
  -r, --reference TEXT            Path to reference genome
  --log-level [DEBUG|INFO|WARNING]
                                  Choose what log messages to show
  --tmp-dir TEXT                  If specific temp dir should be used
  --help                          Show this message and exit.

Commands:
  auto        Run whole pipeline by compressing, comparing and deleting...
  compare     Compare two files by generating checksums.
  compress    Compress genomic files
  decompress  Decompress genomic files

Workflow

Each command can be run separately. To compress all fastq pairs below a directory run crunchy auto spring <path_to_dir>.

  1. Recursively find all fastq pairs

  2. Compress all pairs with spring file_1.fastq + file_2.fastq (spring)-> file.spring

  3. Decompress with spring file.spring (spring)-> file_1.spring.fastq + file_2.spring.fastq

  4. Compare checksum with previous file_1.spring.fastq + file_1.fastq (hashlib)-> compare

  5. Delete fastq (If the compression was lossless) file_1.fastq + file_2.fastq (rm)->

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crunchy-1.0.13.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crunchy-1.0.13-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file crunchy-1.0.13.tar.gz.

File metadata

  • Download URL: crunchy-1.0.13.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for crunchy-1.0.13.tar.gz
Algorithm Hash digest
SHA256 15e09db5358feb0714679f59f4a2541866fc36e52a786e52ff7e681c74321eb1
MD5 c69db51d0d8d778775d5aae6bdb29e0e
BLAKE2b-256 22f97dadef6a0c7f65f434216cc4b163235e0d29e341693172c02426fd4cb7ea

See more details on using hashes here.

File details

Details for the file crunchy-1.0.13-py3-none-any.whl.

File metadata

  • Download URL: crunchy-1.0.13-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for crunchy-1.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 1400632d003266b6c0d0eae381dd6b7eac9a3787ca6abe32f1c1a2b4dbf38bf7
MD5 eb27c6b1655f3430d2916399a875482e
BLAKE2b-256 fd325da611dee9d61cf3782dc8d61068b251de3c22d562e23558196ece3b3809

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page