Skip to main content

A tool for processing SRA accessions

Project description

SRAHunter

Description

SRAHunter is a tool designed to facilitate the downloading and processing of data and metadata from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). This package includes three modules : a module for automatized download of fastq files from SRA (srahunter download), a module for main SRA associated metadata retrieval (srahunter metadata), and a module to retrieve the full associated metadata to an accession number (srahunter fullmetadata).

Installation

As part of the conda repository to install srahunter you can simply use this command

I suggest to use mamba to speed-up the installation process

mamba install -c bioconda enriconeko::srahunter

or as an alternative

conda install -c bioconda enriconeko::srahunter

Also the installation with pip is available with:

Scripts

srahunter download:

Using an SRA accession list downloaded by the user from SRA as input the tool perform the download of the SRA files and the subsequent conversion to single or paired FASTQ files.

This script has been tested for the main sequencing platforms so can be used to download data produced with Illumina, PACBio and ONT platforms.

Usage Example: srahunter download -l <accession_list.txt> <other options>
Main functionality:
  • Automatic removal of .sra files after successfull dumping, the user don't need to do it manually
  • Check disk space at the beginning of every sample download (at least 20G of disk required). If the disk is almost full the script will stop with an error message
  • Remember of the already successfull processed data and, in case of interruption, the script will resume
  • Writing of the failed downloads in a file (failed_list.csv)
Options:
-h Show help message and exit
--list , -i Accession list from SRA (relative or full file path)
-t Number of threads (default: 6)
--path,-sra-path,-p Path to where to download .sra files (default: currentdirectory/tmp_srahunter
--maxsize,-ms Max size of each sra file (default: 50G)
--outdir,-o Path to where to download .fastq files (default: currentdirectory)

Attention!! For the moment only accession Run numbers are supported (e.g. SRR8487013) and must be included in an accession list

srahunter metadata:

This module handles the retrieval of metadata from the NCBI SRA database, splits large input files into manageable chunks, and organizes the fetched data in a final table 'SRA_info.csv'. The module will alsom produce an interactive table in the folder SRA_html. In this case the module will download the most used metadata associated to a Run accession number.

Usage Example: srahunter metadata -i <accession_list.txt>
Main functionalities:
  • Fast data retrieval with Entrez-direct
  • Metadata collection in a clean CSV format
  • HTML interactive table with links to SRA, a chart summarising the data, and the possibility to apply filters
Options:
-h Show help message and exit
-i Accession list from SRA (relative or full file path)

srahunter fullmetadata:

This module handles the retrieval of metadata from the NCBI SRA database, splits large input files into manageable chunks, and organizes the fetched data in a final full table 'Full_SRA_info.csv'. In this case the module will download all the metadata associated to a Run accession number.

Usage Example: srahunter fullmetadata -i <accession_list.txt>
Main functionalities:
  • Fast data retrieval with Entrez-direct
  • Metadata collection in a clean CSV format
Options:
-h Show help message and exit
-i Accession list from SRA (relative or full file path)
#### Error Handling and Troubleshooting If you encounter any issues or errors while using SRAHunter, please check the following common problems: - Ensure that your Conda or Mamba environment is correctly set up. - Verify that the format of your SRA accession list is correct. - Check available disk space if you encounter download issues.

For more help, please open an issue on the GitHub repository.

Contributing

Contributions to SRAHunter are welcome! Please read our contributing guidelines on the GitHub repository for instructions on how to contribute.

License

SRAHunter is released under the MIT License.

Acknowledgments

Special thanks to ....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srahunter-0.0.2.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

srahunter-0.0.2-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file srahunter-0.0.2.tar.gz.

File metadata

  • Download URL: srahunter-0.0.2.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for srahunter-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0e5118d7e9cc75472f02035805ae831e26c0d0ab1f517ba2c50cc74d635a8dda
MD5 7e633bb0cf0cf3c0593532d9f28ed18a
BLAKE2b-256 aa79584c2a9cfa580f5be59447cd02643b67320d29fa3f8404fb028508cc49a6

See more details on using hashes here.

File details

Details for the file srahunter-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: srahunter-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for srahunter-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4e45f54e686945496d326e48452cc88c5aa3703fa124972ba1e4b9c1e324f6e4
MD5 812781b343111d86daefbdf19c16a7b9
BLAKE2b-256 9ad1323169196391307fb660de597d1b90cc9bd9db7df6499079ed3c7c01539e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page