Skip to main content

No project description provided

Project description

DO NOT USE IT TODAY, FIXING BROKEN CODE

Fastq Downloader (WIP)

This python package let you download fastq files from ena. It can automatic merge and rename fastq files based on the input file provided. If you have trouble downloading this repo's release, please go to fastgit

How to use

auto merge multiple files of paired end reads are not tested now, but should be usable

conda create --name fastq-downloader -c conda-forge -c hcc -c bioconda aspera-cli snakemake httpx lxml click beautifulsoup4 python=3.9
## use what ever you want to download the gist mentioned above to thisname.smk
## download whl file from github release of this project to thisname.whl
conda activate fastq-downloader
pip install fastq-downloader==0.3.1
## make sure to create an infotsv before, you can just copy from the geo website,
## then go to vim, type :set paste to get into paste mode, paste the table into vim,
## save the file as whatever name you want, then exit vim
## the white space will be auto convert to underscore
## refresh_acc need to be False if you don't want to query again the accesion number,
## or due to the recreation of the link file(default set to false), all files are to be downloaded.
fastq-downloader smk --info thisname.tsv --out thisname --refresh_acc False

It will automatically try to download the file, check md5, retry if file integrity check failed, and merge the files if the number of files is more than 2, finally rename the files to the description you provided.

prepare the info.tsv like this: note the file must be tab delimited (tsv file), you can simply achieve this by paste it from the Excel or GEO website. Or from SRA Run Selector downloaded csv file.

GSM12345  h3k9me3_rep1
GSM12345  h3k9me3_rep2

todo

  • test for paired-end reads run merge
  • publish to bioconda
  • if fail, retry
  • use dag to run the pipeline (sort of, implemented by using snakemake)
  • option to resume download when md5 not match
  • option to continue from last time download
  • implement second level parallelization

update content

  • 0.3.2:
    • add filter for library layout (some sra entry has content mismatches its library layout)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastq-downloader-0.4.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

fastq_downloader-0.4.1-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file fastq-downloader-0.4.1.tar.gz.

File metadata

  • Download URL: fastq-downloader-0.4.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.1 Linux/5.10.60.1-microsoft-standard-WSL2

File hashes

Hashes for fastq-downloader-0.4.1.tar.gz
Algorithm Hash digest
SHA256 ee49a20c642d72342a5d433d3be96f89a638a658a7158b2c60083c81e0697024
MD5 4419a061858f3c8f26dea739902ac13e
BLAKE2b-256 a68cca17cb5ef65081cf47c29c13a883f63d4fa600a1192279bb5ace8267795a

See more details on using hashes here.

File details

Details for the file fastq_downloader-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: fastq_downloader-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.1 Linux/5.10.60.1-microsoft-standard-WSL2

File hashes

Hashes for fastq_downloader-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf2beb2ab77ec583ea1d4e373ff7287d667986d9980c8814e847fe72769357e5
MD5 7f8090fcf2c2a9571b2cf5c4c64734af
BLAKE2b-256 1eb4da4961e5074ebf9e7ef1a19a108b86a89a7f536d1aa2f01dd3133815bd74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page