A tool to upload Fastq files to the INSaFLU database and perform metagenomics pathogen detection
Project description
findONTime
A tool to upload fastq files (fastq or fastq.gz format) to the INSaFLU-TELEVIR platform and launch the metagenomics pathogen detection analysis using the TELEVIR module
Motivation
Reducing the time needed for pathogen detection and the sequencing costs per sample is crucial in the context of diagnostics using metagenomics sequencing. In fact, when performing hypothesis-free viral diagnosis by sequencing complex biological samples, the proportion of the virus in a sample is unknown. As such, the amount of sequencing data, and consequently run length, needed to accurately detect the virus cannot be predicted a priori. [name of the tool] runs concurrently with MinION sequencing and monitors the FASTQ files that are being generated in real-time for each sample, merges the files (at user defined time intervals), uploads them to the INSaFLU-TELEVIR platform and launches the metagenomics virus detection analysis using the TELEVIR module. This allows users to detect a virus in a sample as early as possible during the sequencing run, reducing the time gap between obtaining the sample and the diagnosis, and also reducing sequencing costs (as ONT runs can be stopped at any time and the flow cells can be cleaned and reused). [name of the tool] can be used as a “start-to-end” solution or for particular tasks (e.g., merging ONT output files, metadata preparation and upload to INSaFLU-TELEVIR).
Introduction
The insaflu-upload tool uploads fastq files to the INSaFLU-TELEVIR platform (docker installation or local server), and launches themetagenomics pathogen detection analysis using the TELEVIR module. The tool relies on fastq-handler, a package to monitor and process outputs of ONT runs, upload the reads, launch TELEVIR projects and generates a report with the results.
Details
The user has the option to upload all files collected throughout the ONT run (sampling occurs at user-defined period) or only upload the last file (i.e, the file compiling all reads generated until the lastest sampling point). For upload, metadata files are also generated for each sequence file, according to the INSaFLU-TELEVIR input template file. Metadata files are stored in the metadata sub-directory following the output directory specified by the user.
Upload
insaflu-upload can interact with the INSaFLU-TELEVIR platfotm in two ways:
-
Docker. The user needs to have docker installed and running. The tool will then upload the files to the docker image. The user needs to provide the name of the docker image and the path for uploads.
-
SSH. The user needs to have access to the database server. The tool will then upload the files to the database using SSH. The user needs to provide the path for uploads and the credentials for the database server.
INSaFLU-TELEVIR
The tool creates one INSaFLU-TELEVIR project for each directory containing fastq files. The project name is the name of the directory. Files generated within the same directory are uploaded to the same project.
Input Files
-
fastq.gz - Output directory for the ONT run, containing sequence files. The files can be in subfolders. The files can be gzipped or not.
-
config.ini - A configuration file containing the parameters for the tool. The file is generated by the tool when it is run for the first time. The user can edit the file to change the parameters.
Config must contain:
-
section [INSAFLU] containing insaflu username and app directory path.
-
(optional) section [SSH] containing ssh credentials: username, ip_address and rsa key;
-
(optional) section [DOCKER] containing docker image name.
see example config.ini
API
usage: findontime [-h] -i IN_DIR -o OUT_DIR [-s SLEEP] [-n TAG] [--config CONFIG] [--max_size MAX_SIZE] [--merge] [--downsize] [--upload {last,all}] [--connect {docker,ssh}] [--keep_names] [--monitor] [--televir]
Process fastq files.
optional arguments:
-h, --help show this help message and exit
-i IN_DIR, --in_dir IN_DIR
Input directory
-o OUT_DIR, --out_dir OUT_DIR
Output directory
-s SLEEP, --sleep SLEEP
Sleep time between checks in monitor mode
-n TAG, --tag TAG name tag, if given, will be added to the output file names
--config CONFIG config file
--max_size MAX_SIZE max size of the output file, in kilobytes
--merge merge files
--downsize downsize fastq files
--upload {last,all} file upload stategy (default: all)
--connect {docker,ssh}
file upload stategy (default: docker)
--keep_names keep original file names
--monitor monitor directory until killed
--televir deploy televir pathogen identification on each sample
REQUIREMENTS
** Modules **
- python 3.6 or higher
- dataclasses==0.6
- natsort==8.3.1
- pandas==1.5.3
- paramiko==3.1.0
- pip==21.2.3
- setuptools==57.4.0
- xopen==1.7.0
INSTALLATION
python -m venv .venv
source .venv/bin/activate
python -m pip install findontime
USAGE
findontime -i test_run/ -o test_new -d test_new --tag another -s 5 --merge –televir
TESTING
Running pytest in the root directory will run all tests that do not interact with INSaFLU-TELEVIR. In order to test the upload and metagenomics functionalities, the user needs to provide a valid config file to a local docker installation, and to pass the --docker
flag to pytest:
pytest --docker --config-file config.ini
OUTPUT
Note: The output directory structure is maintained.
- fastq.gz files containing all reads from the previous files.
- log.txt file containing the concatenation process.
- metadata individual metadata files for each fastq file uploaded.
- results.tsv file containing the results of the pathogen detection. One file per project.
Maintainers
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for findontime-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 186af0f507871d081410f718a0e1cba036cc773c86664de1552053843f71555b |
|
MD5 | e741c541f8d85c2c30ebcfe12a9640cf |
|
BLAKE2b-256 | 200735c518cd656d6970484bea7b469fce548798f8ad4313695d395de6024c02 |