ReadStore Command Line Interface (CLI) Is A Python Package For Accessing Data from the ReadStore API
Project description
ReadStore CLI
This README describes the ReadStore Command Line Interface (CLI). Also available as GitHub Page.
The ReadStore CLI is used to upload FASTQ files and Processed Data to the ReadStore database and access Projects, Datasets, metadata and attachment files. The ReadStore CLI enables you to automate your bioinformatics pipelines by providing simple and standardized access to datasets.
Check the ReadStore Github repository for more information how to get started.
More infos on the ReadStore website https://evo-byte.com/readstore/
Tutorials and Intro Videos how to get started: https://www.youtube.com/@evobytedigitalbio
Blog posts and How-Tos: https://evo-byte.com/blog/
For general questions reach out to info@evo-byte.com
Happy analysis :)
Table of Contents
- Description
- Security and Permissions
- Installation
- Usage
- Contributing
- License
- Credits and Acknowledgments
The Lean Solution for Managing NGS and Omics Data
ReadStore is a platform for storing, managing, and integrating genomic data. It accelerates analysis and offers an easy way to manage and share FASTQ file, NGS datasets and processed datasets. With built-in project and metadata management, ReadStore structures your workflows, and its collaborative user interface enhances teamwork — so you can focus on generating insights.
The integrated Webservice (API) enables your to directly retrieve data from ReadStore via the terminal Command-Line-Interface (CLI) or Python / R SDKs.
The ReadStore Basic version provides a local web server with simple user management. For organization-wide deployment, advanced user and group management, or cloud integration, please check out the ReadStore Advanced versions and contact us at info@evo-byte.com.
Description
The ReadStore Command-Line Interface (CLI) is a powerful tool for uploading and managing your omics data. With the ReadStore CLI, you can upload FASTQ files and Processed Data directly to the ReadStore database, as well as access and manage Projects, Datasets, metadata, and attachment files with ease.
The CLI can be run from your shell or terminal and is designed for seamless integration into data pipelines and scripts, enabling efficient automation of data management tasks. This flexibility allows you to integrate the ReadStore CLI within any bioinformatics application or pipeline, streamlining data uploads, access, and organization within ReadStore.
By embedding the ReadStore CLI in your bioinformatics workflows, you can improve efficiency, reduce manual tasks, and ensure your data is readily accessible for analysis and collaboration.
Security and Permissions
PLEASE READ AND FOLLOW THESE INSTRUCTIONS CAREFULLY!
User Accounts and Token
Using the CLI with a ReadStore server requires an active User Account and a Token. You should never enter your user account password when working with the CLI.
To retrieve your token:
- Login to the ReadStore web app via your browser
- Navigate to
Settingspage and click onToken - If needed you can regenerate your token (
Reset). This will invalidate the previous token
For uploading FASTQ files or Processed Data your User Account needs to have Staging Permission. If you can check this in the Settings page of your account. If you not have Staging Permission, ask the ReadStore server Admin to grant you permission.
CLI Configuration
After running the readstore configure the first time, a configuration file is created in your home directory (~/.readstore/config) to store your credentials and CLI configuration.
The config file is created with user-excklusive read-/write permissions (chmod 600), please make sure to keep the file permissions restricted.
You find more information on the configuration file below.
Installation
pip3 install readstore-cli
You can perform the install in a conda or venv virtual environment to simplify package management.
A local install is also possible
pip3 install --user readstore-cli
Make sure that ~/.local/bin is on your $PATH in case you encounter problems when starting the server.
Validate the install by running
readstore -v
This should print the ReadStore CLI version
Usage
Detailed tutorials, videos and explanations are found on YouTube or on the EVOBYTE blog.
Quickstart
Let's upload some FASTQ files.
1. Configure CLI
Make sure you have the ReadStore CLI installed and a running ReadStore server with your user registered.
-
Run
readstore configure -
Enter your username and token
-
Select the default output of your CLI requests. You can choose between
textoutputs, comma-separatedcsvorjson. -
Run
readstore configure listand check if your credentials are correct.
2. Upload Files
For uploading FASTQ files your User Account needs to have Staging Permission. If you can check this in the Settings page of your account. If you not have Staging Permission, ask the ReadStore Server Admin to grant you permission.
Move to a folder that contains some FASTQ files
readstore upload myfile_r1.fastq
This will upload the file and run the QC check. You can select multiple files at once using the * wildcard.
The fastq files need to have the default file endings .fastq, .fastq.gz, .fq, .fq.gz.
You can also upload multiple FASTQ files from a template .csv file using the import fastq function. More information below.
3. Stage Files
Login to the web app on your browser and move to the Staging page. Here you find a list of all FASTQ files that you just uploaded. For larger files, the QC step can take a while to complete.
FASTQ files are grouped into Datasets which you can Check In. Checked In Datasets appear in the Datasets page and can be accessed by the CLI.
Check the Batch Check In button to import several Dataset at once.
4. Access Datasets via the CLI
The ReadStore CLI enables programmatic access to Datasets and FASTQ files. Some examples are:
readstore list List all FASTQ files
readstore get --id 25 Get detailed view on Dataset 25
readstore get --id 25 --read1-path Get path for Read1 FASTQ file
readstore get --id 25 --meta Get metadata for Dataset 25
readstore project get --name cohort1 --attachment Get attachment files for Project "cohort1"
You can find a full list of CLI commands below.
5. Managing Processed Data
Processed Data refer to files generated through processing of raw sequencing data. Depending on the omics technology and assay used, this could be for instance transcript count files, variant files or gene count matrices.
readstore pro-data upload -d test_dataset_1 -n test_dataset_count_matrix -t count_matrix test_count_matrix.h5
Upload count matrix test_count_matrix.h5 with name "test_dataset_count_matrix" for dataset with name "test_dataset_1"
readstore pro-data list List Processed Data for all Datasets and Projects
readstore pro-data get -d test_dataset_1 -n test_dataset_count_matrix Get ProData details for Dataset "test_dataset_1" with the name "test_dataset_count_matrix"
readstore pro-data delete -d test_dataset_1 -n test_dataset_count_matrix Delete ProData for dataset "test_dataset_1" with the name "test_dataset_count_matrix"
The delete operation does not remove the file from the file system, only from the database. A user needs Staging Permission to create or remove datasets.
CLI Configuration
readstore configure manages the CLI configuration. To setup the configuration:
-
Run
readstore configure -
Enter your username and token
-
Select the default output of your CLI requests. You can choose between
textoutputs, comma-separatedcsvorjson. -
Run
readstore configure listand check if your credentials are correct.
If you already have a configuration in place, the CLI will ask whether you want to overwrite the existing credentials. Select y if yes.
After running the readstore configure the first time, a configuration file is created in your home directory (~/.readstore/config).
The config file is created with user-excklusive read-/write permissions (chmod 600), please make sure to keep the file permissions restricted.
[general]
endpoint_url = http://localhost:8000
fastq_extensions = ['.fastq', '.fastq.gz', '.fq', '.fq.gz']
output = csv
[credentials]
username = myusername
token = myrandomtoken
You can further edit the configuration of the CLI client from this configuration file. In case your ReadStore Django server is not run at the default port 8000, you need to update the endpoint_url. If you need to process FASTQ files with file endings other than those listed in fastq_extensions, you can modify the list.
Upload FASTQ Files
For uploading FASTQ files your User Account needs to have Staging Permission. You can check this in the Settings page of your account. If you do not have Staging Permission, ask the ReadStore Server Admin to grant you permission.
readstore upload myfile_r1.fastq myfile_r2.fastq ...
This will upload the files and run the QC check. You can select several files at once using the * wildcard. It can take some time before FASTQ files are available in your Staging page depending on how large file are and how long the QC step takes.
usage: readstore upload [options]
Upload FASTQ Files
positional arguments:
fastq_files FASTQ Files to Upload
Import FASTQ files from .csv Template
Import FASTQ files from template .csv file.
A .csv file can be downloaded from the ReadStore App in the Staging Page or from this repository,
or is available in this repository under assets/readstore_template.csv
The template .csv file must contain the columns FASTQFileName,ReadType & UploadPath.
- FASTQFileName Name for the FASTQ File in ReadStore DB
- ReadType FASTQ Read Type: R1 (Read 1), R2 (Read 2), I1 (Index 1) or I2 (Index 2)
- Upload Path File path to FASTQ file. Must be accessible from ReadStore server
usage: readstore import fastq [options]
Import FASTQ Files
positional arguments:
fastq_template FASTQ Template .csv File
Access Projects
There are 3 commands for accessing projects, readstore project list, readstore project get and readstore project download.
listprovides an overview of project, metadata and attachmentsgetprovides detailed information on individual projects and to its metadata and attachmentsdownloadlets you download attachment files of a project from the ReadStore database
readstore project list
usage: readstore project ls [options]
List Projects
options:
-h, --help show this help message and exit
-m, --meta Get Metadata
-a, --attachment Get Attachment
--output {json,text,csv}
Format of command output (see config for default)
Show project id and name.
The -m/--meta include metadata for projects as json string in output.
The -a/--attachment include attachment names as list in output.
Adapt the output format of the command using the --output options.
readstore project get
usage: readstore project get [options]
Get Project
options:
-h, --help show this help message and exit
-id , --id Get Project by ID
-n , --name Get Project by name
-m, --meta Get only Metadata
-a, --attachment Get only Attachment
--output {json,text,csv}
Format of command output (see config for default)
Show project details for a project selected either by --id or the --name argument.
The project details include description, date of creation, attachments and metadata
The -m/--meta shows only the metadata with keys in header.
The -a/--attachment shows only the attachments.
Adapt the output format of the command using the --output options.
Example: readstore project get --id 2
readstore project download
usage: readstore project download [options]
Download Project Attachments
options:
-h, --help show this help message and exit
-id , --id Select Project by ID
-n , --name Select Project by name
-a , --attachment Set Attachment Name to download
-o , --outpath Download path or directory (default . )
Download attachment files for a project. Select a project selected either by --id or the --name argument.
With the --attachment argument you specify the name of the attachment file to download.
Use the --outpath to set a directory to download files to.
Example readstore project download --id 2 -a ProjectQC.pptx -o ~/downloads
Access Datasets and FASTQ Files
There are 3 commands for accessing dataset, readstore list, readstore get and readstore download.
listprovides an overview of datasets, metadata and attachmentsgetprovides detailed information on an individual dataset and to its metadata and attachments and individual FASTQ read files and statistics.downloadlets you download attachment files of a dataset
readstore list
usage: readstore ls [options]
List FASTQ Datasets
options:
-h, --help show this help message and exit
-p , --project-name Subset by Project Name
-pid , --project-id Subset by Project ID
-m, --meta Get Metadata
-a, --attachment Get Attachment
--output {json,text,csv}
Format of command output (see config for default)
Show dataset id, name, description, qc_passed, paired_end, index_read, project_ids and project_names
-p/--project-name subset dataset from a specified project
-pid/--project-id subset dataset from a specified project
-m/--meta include metadata for datasets
-a/--attachment include attachment names as list for datasets
Adapt the output format of the command using the --output options.
readstore get
usage: readstore get [options]
Get FASTQ Datasets and Files
options:
-h, --help show this help message and exit
-id , --id Get Dataset by ID
-n , --name Get Dataset by name
-m, --meta Get only Metadata
-a, --attachment Get only Attchments
-r1, --read1 Get Read 1 Data
-r2, --read2 Get Read 2 Data
-r1p, --read1-path Get Read 1 FASTQ Path
-r2p, --read2-path Get Read 2 FASTQ Path
-i1, --index1 Get Index 1 Data
-i2, --index2 Get Index 2 Data
-i1p, --index1-path Get Index 1 FASTQ Path
-i2p, --index2-path Get Index 2 FASTQ Path
--output {json,text,csv}
Format of command output (see config for default)
Show details for a dataset selected either by --id or the --name argument.
-m/--meta shows only the metadata with keys in header.
-a/--attachment shows only the attachments.
-r1/--read1 shows details for dataset Read 1 data (same for --read2, --index1, --index2)
-r1p/--read1-path returns path for dataset Read 1 (same for --read2-path, --index1-path, --index2-path)
Adapt the output format of the command using the --output options.
Example: readstore get --id 2
Example: readstore get --id 2 --read1-path
readstore download
usage: readstore download [options]
Download Dataset attachments
options:
-h, --help show this help message and exit
-id , --id Select Dataset by ID
-n , --name Select Dataset by name
-a , --attachment Set Attachment Name to download
-o , --outpath Download path or directory (default . )
Download attachment files for a dataset. Select dataset either by --id or the --name argument.
With the --attachment argument you specify the name of the attachment file to download.
Use the --outpath to set a directory to download files to.
Example readstore download --id 2 -a myAttachment.csv -o ~/downloads
Access Processed Data
readstore pro-data upload
usage: readstore pro-data upload [options]
Upload Processed Data
positional arguments:
pro_data_file Path to Processed Data File to Upload
options:
-h, --help show this help message and exit
-did , --dataset-id Set associated Dataset by ID
-d , --dataset-name Set associated Dataset by Name
-n , --name Set Processed Data Name (required)
-t , --type Set Type of Processed Data (e.g. gene_counts) (required)
--description Set Description
-m META, --meta META Set metadata as JSON string (e.g '{"key": "value"}')
Upload Processed Data to ReadStore database and connect with an existing dataset.
Processed Data can be any file type and tyically represent datasets for downstream omics analysis, for instance gene count matrices or variant files.
Your ReadStore user account is required to have Staging Permissions to upload or delete Processed Data.
You need to specify a --dataset-id or --dataset-name to select the dataset to attach files to.
-n/--name defines the name to set for the processed data in the ReadStore DB
-t/--type defines the data type of the processed dataset. The type is free to choose, for instance gene_counts or count_matrix
-m/--meta enables to set metadata for the processed data (optional). This attribute must be a json-formatted string, e.g. '{"key": "value"}'
--description set a optional description for the dataset (optional).
Example: readstore pro-data upload -d test_dataset_1 -n test_dataset_count_matrix -t count_matrix -m '{"key":"value"}' test_count_matrix.h5
readstore pro-data list
usage: readstore pro-data list [options]
List Processed Data
options:
-h, --help show this help message and exit
-pid , --project-id Subset by Project ID
-p , --project-name Subset by Project Name
-did , --dataset-id Subset by Dataset ID
-d , --dataset-name Subset by Dataset Name
-n , --name Subset by ProData Name
-t , --type Subset by Data Type
-m, --meta Get Metadata
-a, --archived Include Archived ProData
--output {json,text,csv}
Format of command output (see config for default)
List Processed Data stored in the ReadStore database.
You can subset the list by Projects (-pid/-p), Datasets (-did/-d) and/or by the specific Name (-n) of the Processed Data stored.
-m/--meta Also show metadata
-a/--archived Show archived Processed Data.
Processed Data are archived when a new file with the same name attribute is uploaded. This invalidates a previous version of the Processed Data
Example: readstore pro-data list -p TestProject
readstore pro-data get
usage: readstore pro-data get [options]
Get Processed Data
options:
-h, --help show this help message and exit
-id , --id Get ProData by ID
-did , --dataset-id Get ProData by Dataset ID
-d , --dataset-name Get ProData by Dataset Name
-n , --name Get ProData by Name
-m, --meta Get only Metadata
-p, --upload-path Get only Upload Path
-v , --version Get ProData Version (default: latest)
--output {json,text,csv}
Format of command output (see config for default)
Get single Processed Data by their -id or the associated --dataset-id/--dataset-name plus --name argument.
-m/--meta Return only metadata
-p/--upload-path Return only upload path
-v/--version Select ProData by specific version (Optional). Default: latest version.
Example: readstore pro-data get -d test_dataset_1 -n test_dataset_count_matrix
readstore pro-data delete
usage: readstore pro-data delete [options]
Delete Processed Data
options:
-h, --help show this help message and exit
-id , --id Delete ProData by ID
-did , --dataset-id Delete ProData by Dataset ID
-d , --dataset-name Delete ProData by Dataset Name
-n , --name Delete ProData by Name
-v , --version Delete ProData Version (default: latest)
Delete Processed Data by their -id or the associated --dataset-id / --dataset-name plus --name argument.
-v/--version Delete ProData by specific version (Optional). Default: latest version.
Example: readstore pro-data delete -d test_dataset_1 -n test_dataset_count_matrix
Contributing
Contributions make this project better! Whether you want to report a bug, improve documentation, or add new features, any help is welcomed!
How You Can Help
- Report Bugs
- Suggest Features
- Improve Documentation
- Code Contributions
Contribution Workflow
- Fork the repository and create a new branch for each contribution.
- Write clear, concise commit messages.
- Submit a pull request and wait for review.
Thank you for helping make this project better!
License
The ReadStore CLI is licensed under an Apache 2.0 Open Source License. See the LICENSE file for more information.
Credits and Acknowledgments
ReadStore CLI is built upon the following open-source python packages and would like to thank all contributing authors, developers and partners.
- Python (https://www.python.org/)
- Requests (https://requests.readthedocs.io/en/latest/)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file readstore_cli-1.1.0.tar.gz.
File metadata
- Download URL: readstore_cli-1.1.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
894d84673c1bc49f0a85b028965841f0a9f0d4354d3c320f4933d914daed46f7
|
|
| MD5 |
66825f31dffea918a1d622da4998546c
|
|
| BLAKE2b-256 |
73c01fc74645b298c153f05388e23fd9af0fd38a149bef13e17fcd1d9f382260
|
File details
Details for the file readstore_cli-1.1.0-py3-none-any.whl.
File metadata
- Download URL: readstore_cli-1.1.0-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d9cb48cb4cd86f1af0613b4bdb3107bccd637d9ec976079e560b6b63788791f
|
|
| MD5 |
540aa4e095d1fca0746dffcde8d71c79
|
|
| BLAKE2b-256 |
7c679fad0ba4139e7f2c36ad9e6ea9872a9a862aea2a4c9a69fbf887faef4f6d
|