PyReadStore is the Python client (SDK) for the ReadStore API
Project description
PyReadStore SDK
This README describes PyReadStore, the Python client (SDK) for the ReadStore API.
The full ReadStore Basic documentation is available here
PyReadStore can be used to access Projects, Datasets, ProData as well as metadata and attachment files in the ReadStore Database from Python code. The package enables you to automate your bioinformatics pipelines, Python scripts and notebooks.
Check the ReadStore Github repository for more information on how to get started with ReadStore and setting up your server.
More infos on the ReadStore website
Tutorials and Intro Videos: https://www.youtube.com/@evobytedigitalbio
Blog posts and How-Tos: https://evo-byte.com/blog/
For general questions reach out to info@evo-byte.com or in case of technical problems to support@evo-byte.com
Happy analysis :)
Table of Contents
The Lean Solution for Managing NGS and Omics Data
ReadStore is a platform for storing, managing, and integrating omics data. It speeds up analysis and offers a simple way of managing and sharing NGS omics datasets, metadata and processed data (Processed Data). Built-in project and metadata management structures your workflows and a collaborative user interface enhances teamwork — so you can focus on generating insights.
The integrated Webservice (API) enables your to directly retrieve data from ReadStore via the terminal Command-Line-Interface (CLI) or Python / R SDKs.
The ReadStore Basic version provides a local webserver with a simple user management. If you need an organization-wide deployment, advanced user and group management or cloud integration please check the ReadStore Advanced versions and reach out to info@evo-byte.com.
Description
PyReadStore is a Python client (SDK) that lets you easily connect to your ReadStore server and interact with the ReadStore API. By importing the pyreadstore package in Python, you can quickly retrieve data from a ReadStore server.
This tool provides streamlined and standardized access to NGS datasets and metadata, helping you run analyses more efficiently and with fewer errors. You can easily scale your pipelines, and if you need to migrate or move NGS data, updating the ReadStore database ensures all your workflows stay up-to-date.
Security and Permissions
PLEASE READ AND FOLLOW THESE INSTRUCTIONS CAREFULLY!
User Accounts and Token
Using PyReadStore requires an active user account and a token (and a running ReadStore server).
You should never enter your user account password when working with PyReadStore.
To retrieve your token:
- Login to the ReadStore app via your browser
- Navigate to
Settingspage and click onToken - You can regenerate your token anytime (
Reset). This will invalidate the previous token
For uploading FASTQ files your user account needs to have Staging Permission.
You can check this in the Settings page of your account.
If you not have Staging Permission, ask your ReadStore server admin to grant you permission.
Setting Your Credentials
You need to provide the PyReadStore client with valid ReadStore credentials.
There are different options
-
Load credentials from the ReadStore
configfile. The file is generated by the ReadStore CLI, by default in your home directory (~/.readstore/). Make sure to keep read permissions to the file restrictive -
Directly enter your username and token when instantiating a PyReadStore client within your Python code
-
Set username and token via environment variables (
READSTORE_USERNAME,READSTORE_TOKEN). This is useful in container or cloud environments.
Installation
pip3 install pyreadstore
You can perform the install in a conda or venv virtual environment to simplify package management.
A local install is also possible
pip3 install --user pyreadstore
import pyreadstore
ReadStore API
The ReadStore Basic server provides a RESTful API for accessing resources via HTTP requests.
This API extends the functionalities of the ReadStore CLI as well as the Python and R SDKs.
API Endpoint
By default, the API is accessible at:
http://127.0.0.1:8000/api_x_v1/
Authentication
Users must authenticate using their username and token via the Basic Authentication scheme.
Example Usage
Below is an example demonstrating how to use the ReadStore CLI to retrieve an overview of Projects by sending an HTTP GET request to the project/ endpoint.
In this example, the username is testuser, and the token is 0dM9qSU0Q5PLVgDrZRftzw. You can find your token in the ReadStore settings.
curl -X GET -u testuser:0dM9qSU0Q5PLVgDrZRftzw http://localhost:8000/api_x_v1/project/
Example Reponse
A successful HTTP response returns a JSON-formatted string describing the project(s) in the ReadStore database. Example response:
[{
"id": 4,
"name": "TestProject99",
"metadata": {
"key1": "value1",
"key2": "value2"
},
"attachments": []
}]
Documentation
Comprehensive API documentation is available in the ReadStore Basic Docs.
Usage
Detailed tutorials, videos and explanations are found on YouTube or on the EVOBYTE blog.
Quickstart
Let's access some dataset and project data from the ReadStore database!
Make sure a ReadStore server is running and reachable (by default under 127.0.0.1:8000).
You can enter (http://127.0.0.1:8000/api_v1/) in your browser and should get a response from the API.
We assume you ran readstore configure before to create a config file for your user.
If not, consult the ReadStore CLI README on how to set this up.
We will create a client instance and perform some operations to retrieve data from the ReadStore database. More information on all available methods can be found below.
import pyreadstore
rs_client = pyreadstore.Client() # Create an instance of the ReadStore client
# Manage Datasets
datasets = rs_client.list() # List all datasets and return pandas dataframe
datasets_project_1 = rs_client.list(project_id = 1) # List all datasets for project 1
datasets_id_25 = rs_client.get(dataset_id = 25) # Get detailed data for dataset 25
# Manage Projects
projects = rs_client.list_projects() # List all projects
projects = rs_client.get_project(project_name = 'MyProject') # Get details for MyProject
fastq_data_id_25 = rs_client.get_fastq(dataset_id = 25) # Get fastq file data for dataset 25
rs_client.download_attachment(dataset_id = 25, # Download files attached to dataset 25
attachment_name = 'gene_table.tsv')
# Manage Processed Data
rs_client.upload_pro_data(name = 'sample_1_count_matrix', # Set name of count matrix
pro_data_file = 'path/to/sample_1_counts.h5', # Set file path
data_type = 'count_matrix', # Set type to 'count_matrix'
dataset_id = 25) # Set dataset id for upload
pro_data_project_1 = rs_client.list_pro_data(project_id = 1) # Get all ProData entries for Project 1
pro_data = rs_client.get_pro_data(name = 'sample_1_count_matrix', # Set name to sample_1_count_matrix
dataset_id = 25) # dataset_id
pro_data_id = rs_client.delete_pro_data(name = 'sample_1_count_matrix',
dataset_id = 25)
# Ingest FASTQ files
rs_client.upload_fastq(fastq = ['path/to_fastq_r1.fq', 'path/to_fastq_r2.fq'], # Upload a FASTQ files
fastq_name = ['sample_rep_1_r1', 'sample_rep_1_r2'], # Set FASTQ names
read_type = ['R1', 'R2']) # Set individual FASTQ read types
Configure the Python Client
The Client is the central object and provides authentication against the ReadStore API.
By default, the client will try to read the ~/.readstore/config credentials file.
You can change the directory if your config file is located in another folder.
If you set the username and token arguments, the client will use these credentials instead.
If your ReadStore server is not running under localhost (127.0.0.1) port 8000, you can adapt the default settings.
pyreadstore.Client(config_dir: str = '~/.readstore', # Directory containing ReadStore credentials
username: str | None = None, # Username
token : str | None = None, # Token
host: str = 'http://localhost', # Hostname / IP of ReadStore server
return_type: str = 'pandas', # Default return types, can be pandas or json
port: int = 8000, # Server Port Number
fastq_extensions: List[str] = ['.fastq','.fastq.gz','.fq','.fq.gz'])
# Accepted FASTQ file extensions for upload validation
Is is possible to set userame, token, server endpoint and fastq extensions using the listed environment variables. The enironment variables precede over other client configurations.
READSTORE_USERNAME(username)READSTORE_TOKEN(token)READSTORE_ENDPOINT_URL(http://host:post, e.g.http://localhost:8000)READSTORE_FASTQ_EXTENSIONS(fastq_extensions,'.fastq',.fastq.gz,.fq,.fq.gz')
Possible errors
- Connection Error: If no ReadStore server was found at the provided endpoint
- Authentication Error: If provided username or token are not found
- No Permission to Upload/Delete FASTQ/ProData: User has no [Staging Permissions]
Access Datasets
# List ReadStore Datasets
rs_client.list(project_id: int | None = None, # Filter datasets for project with id `project_id`
project_name: str | None = None, # Filter datasets for project with name `project_name`
return_type: str | None = None # Return pd.DataFrame or JSON type
) -> pd.DataFrame | List[dict]
# Get ReadStore Dataset Details
# Provide dataset_id OR dataset_name
rs_client.get(dataset_id: int| None = None, # Get dataset with id `dataset_id`
dataset_name: str | None = None, # Filter datasets with name `dataset_name`
return_type: str | None = None # Return pd.Series or json(dict)
) -> pd.Series | dict
# Get FASTQ file data for a dataset
# Provide dataset_id OR dataset_name
rs_client.get_fastq(dataset_id: int| None = None, # Get fastq data for dataset with id `dataset_id`
dataset_name: str | None = None, # Get fastq data for dataset `dataset_name`
return_type: str | None = None # Return pd.Series or json(dict)
) -> pd.DataFrame | List[dict]
# Return metadata for datasets in a dedicated pandas dataframe
# Metadata keys are pivoted as column, and values as rows
rs_client.list_metadata(project_id: int | None = None, # Subset by project_id
project_name: str | None = None # Subset by project_name
) -> pd.DataFrame:
Edit Datasets
NOTE Editing methods as create or delete require Staging Permission authorization.
When creating datasets, the name argument and metadata dictionary are checked for consistency: Each must not be empty, contain only alphanumeric characters (plus _-.@). Metadata keys must not contain reserved keywords (listed below).
# Create an empty Dataset, without FASTQ files attached
# Name must be unique in Database
# Optionally define Project IDs and/or Project names to attach Dataset to.
rs_client.create(dataset_name: str, # Set name
description: str = '', # Set description. Defaults to ''.
project_ids: List[int] = [], # Set project_ids. Defaults to [].
project_names: List[str] = [], # Set project_names. Defaults to [].
metadata: dict = {}) # Set metadata. Defaults to {}.
# Update a Dataset
# Dataset_id must be provided to define the dataset to update.
# Only arguments where a new values is specied will be updated.
# Argument with None value remain unaltered.
rs_client.update(dataset_id: int, # Set ID to update
dataset_name: str | None = None, # Updated name (optional)
description: str | None = None, # Updated description (optional)
project_ids: List[int] | None = None, # Updated project_ids (optional)
project_names: List[str] | None = None, # Updated project_names (optional)
metadata: dict | None = None, # Updated metadata (optional)
# Provide empty project_ids or project_names list [] to unset all associated projects
# Delete Dataset (and attached FASTQ files)
# Either dataset_id or dataset_name argument must be provided
rs_client.delete(dataset_id: int | None = None, # Delete by ID. Defaults to None.
dataset_name: str | None = None) # Delete by Name. Defaults to None.
Access Projects
# List ReadStore Projects
rs_client.list_projects(return_type: str | None = None # Return pd.DataFrame or JSON type
) -> pd.DataFrame | List[dict]
# Get ReadStore Project Details
# Provide project_id OR project_name
rs_client.get_project(project_id: int| None = None, # Get dataset with id `project_id`
project_name: str | None = None, # Filter datasets with name `project_name`
return_type: str | None = None # Return pd.Series or json(dict)
) -> pd.Series | dict
# Return metadata for datasets in a dedicated pandas dataframe
# Metadata keys are pivoted as column, and values as rows
rs_client.list_projects_metadata() -> pd.DataFrame:
Edit Projects
NOTE Editing methods as create or delete require Staging Permission authorization.
When creating datasets, the name argument and metadata dictionary are checked for consistency: Each must not be empty, contain only alphanumeric characters (plus _-.@). Metadata keys must not contain reserved keywords (listed below).
# Create ReadStore Project
# name must be unique in Database
# dataset_metadata_keys can be attached and will be set as default metadata keys for attached datasets
rs_client.create_project(project_name: str, # Set Project name
description: str = '', # Set Project description. Defaults to ''.
metadata: dict = {}, # Set Project metadata. Defaults to {}.
dataset_metadata_keys: List[str] = []) # Set dataset metadata keys. Defaults to [].
# Update a Project
# Project_id must be provided to define the project to update.
# Only arguments where a new values is specied will be updated.
# Argument with None value remain unaltered.
rs_client.update_project(project_id: int, # Set project id to update
project_name: str | None = None, # Updated name (optional)
description: str | None = None, # Updated description (optional)
metadata: dict | None = None, # Updated metadata (optional)
dataset_metadata_keys: List[str] | None = None) # Updated metadata keys (optional)
# Delete ReadStore Project
# Either project_id or project_name argument must be provided
rs_client.delete_project(project_id: int | None = None, # Delete by ID. Defaults to None.
project_name: str | None = None) # Delete by Name. Defaults to None.
Access Processed Data
# Upload Processed Data
rs_client.upload_pro_data(name: str, # Name of ProData
pro_data_file: str, # Set ProData file path
data_type: str, # Set ProData data type
description: str = '', # Description for ProData
metadata: dict = {}, # MetaData
dataset_id: int | None = None, # Dataset ID to assign ProData to
dataset_name: str | None = None)# Dataset Name to assign ProData to
# Must provide dataset_id or dataset_name
# List and filter Processed Data
rs_client.list_pro_data(project_id: int | None = None, # Filter by Project ID
project_name: str | None = None, # Filter by Project Name
dataset_id: int | None = None, # Filter by Dataset ID
dataset_name: str | None = None, # Filter by Dataset Name
name: str | None = None, # Filter by ProData name
data_type: str | None = None, # Filter by ProData data type
include_archived: bool = False, # Include archived
return_type: str | None = None) -> pd.DataFrame | List[dict]
# Get individual ProData entry
rs_client.get_pro_data(pro_data_id: int | None = None, # Get ProData by ID
dataset_id: int | None = None, # Get ProData by Dataset ID
dataset_name: str | None = None, # Get ProData by Dataset Name
name: str | None = None, # Get ProData by Name ID
version: int | None = None, # Get specific verion, None returns latest valid version
return_type: str | None = None) -> pd.Series | dict
# Provide ID or Name + Dataset ID/Name
# Get metadata from ProData entries
rs_client.list_pro_data_metadata(project_id: int | None = None, # Subset by project ID
project_name: str | None = None, # Subset by project name
dataset_id: int | None = None, # Subset by Dataset ID
dataset_name: str | None = None, # Subset by Dataset Name
name: str | None = None, # Subset by ProData Name
data_type: str | None = None, # Subset by ProData Type
include_archived: bool = False # Include Archived entries
) -> pd.DataFrame
# Delete ProData entry
rs_client.delete_pro_data(pro_data_id: int | None = None, # Delete by ProData ID
dataset_id: int | None = None, # Delete by Dataset ID
dataset_name: str | None = None, # Delete by Dataset Name
name: str | None = None, # Delete by name
version: int | None = None): # Delete specific version
# Provide ID or Name + Dataset ID/Name for delete
Download Attachments
# Download project attachment file from ReadStore Database
rs_client.download_project_attachment(attachment_name: str, # name of attachment file
project_id: int | None = None, # project id with attachment
project_name: str | None = None, # project name with attachment
outpath: str | None = None) # Path to download file to
# Download dataset attachment file from ReadStore Database
rs_client.download_attachment(attachment_name: str, # name of attachment file
dataset_id: int | None = None, # datatset id with attachment
dataset_name: str | None = None, # datatset name with attachment
outpath: str | None = None) # Path to download file to
Upload FASTQ files
Upload FASTQ files to ReadStore server. The methods checks if the FASTQ files exist and end with valid FASTQ ending.
# Upload FASTQ files to ReadStore
rs_client.upload_fastq(fastq : List[str] | str) # Path of FASTQ files to upload
Reserved keywords
The following keywords must not be used as metadata keys
'id','name','project','project_ids','project_names','owner_group_name','qc_passed','paired_end',
'index_read','created','description','owner_username','fq_file_r1','fq_file_r2','fq_file_i1',
'fq_file_i2','id_project','name_project','name_og','archived','collaborators','dataset_metadata_keys',
'data_type','version','valid_to','upload_path','owner_username','fq_dataset','id_fq_dataset','name_fq_dataset'
Contributing
Contributions make this project better! Whether you want to report a bug, improve documentation, or add new features, any help is welcomed!
How You Can Help
- Report Bugs
- Suggest Features
- Improve Documentation
- Code Contributions
Contribution Workflow
- Fork the repository and create a new branch for each contribution.
- Write clear, concise commit messages.
- Submit a pull request and wait for review.
Thank you for helping make this project better!
License
The pyreadstore is licensed under an Apache 2.0 Open Source License. See the LICENSE file for more information.
Credits and Acknowledgments
pyreadstore is built upon the following open-source python packages and would like to thank all contributing authors, developers and partners.
- Python (https://www.djangoproject.com/)
- requests (https://requests.readthedocs.io/en/latest/)
- pydantic (https://docs.pydantic.dev/latest/)
- pandas (https://pandas.pydata.org/)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyreadstore-1.3.2.tar.gz.
File metadata
- Download URL: pyreadstore-1.3.2.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71297095b6cfb5a43fc6678841aff58159ec612e7311205323ce9b6d54fafb66
|
|
| MD5 |
27c1c27959c6312b8abb704292531ee7
|
|
| BLAKE2b-256 |
ad415525f3067a6a69523703d5785b5254c91dedf4e93b5a3361d7963238b925
|
Provenance
The following attestation bundles were made for pyreadstore-1.3.2.tar.gz:
Publisher:
release-main-publish.yml on EvobyteDigitalBiology/pyreadstore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyreadstore-1.3.2.tar.gz -
Subject digest:
71297095b6cfb5a43fc6678841aff58159ec612e7311205323ce9b6d54fafb66 - Sigstore transparency entry: 170390270
- Sigstore integration time:
-
Permalink:
EvobyteDigitalBiology/pyreadstore@11badbbf00b018e115fa1dfbedc1a31468deaf4d -
Branch / Tag:
refs/tags/v1.3.2 - Owner: https://github.com/EvobyteDigitalBiology
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-main-publish.yml@11badbbf00b018e115fa1dfbedc1a31468deaf4d -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyreadstore-1.3.2-py3-none-any.whl.
File metadata
- Download URL: pyreadstore-1.3.2-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c42e453476afeb31187a484b8932f5e21e95ef6b9f04767e9f76fc48b211992a
|
|
| MD5 |
eae964172a9c91e7009e18c4312f1967
|
|
| BLAKE2b-256 |
eb96acdaa7a476230587e9d5aba6c4a1a116ecf97e2f487fe63a5b0eb29d979f
|
Provenance
The following attestation bundles were made for pyreadstore-1.3.2-py3-none-any.whl:
Publisher:
release-main-publish.yml on EvobyteDigitalBiology/pyreadstore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyreadstore-1.3.2-py3-none-any.whl -
Subject digest:
c42e453476afeb31187a484b8932f5e21e95ef6b9f04767e9f76fc48b211992a - Sigstore transparency entry: 170390274
- Sigstore integration time:
-
Permalink:
EvobyteDigitalBiology/pyreadstore@11badbbf00b018e115fa1dfbedc1a31468deaf4d -
Branch / Tag:
refs/tags/v1.3.2 - Owner: https://github.com/EvobyteDigitalBiology
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-main-publish.yml@11badbbf00b018e115fa1dfbedc1a31468deaf4d -
Trigger Event:
push
-
Statement type: