Skip to main content

PyReadStore is the Python client (SDK) for the ReadStore API

Project description

GitHub Release PyPI - Version Build Status

PyReadStore SDK

This README describes PyReadStore, the Python client (SDK) for the ReadStore API.

The full ReadStore Basic documentation is available here

PyReadStore can be used to access Projects, Datasets, ProData as well as metadata and attachment files in the ReadStore Database from Python code. The package enables you to automate your bioinformatics pipelines, Python scripts and notebooks.

Check the ReadStore Github repository for more information on how to get started with ReadStore and setting up your server.

More infos on the ReadStore website

Tutorials and Intro Videos: https://www.youtube.com/@evobytedigitalbio

Blog posts and How-Tos: https://evo-byte.com/blog/

For general questions reach out to info@evo-byte.com or in case of technical problems to support@evo-byte.com

Happy analysis :)

Table of Contents

The Lean Solution for Managing NGS and Omics Data

ReadStore is a platform for storing, managing, and integrating omics data. It speeds up analysis and offers a simple way of managing and sharing NGS omics datasets, metadata and processed data (Processed Data). Built-in project and metadata management structures your workflows and a collaborative user interface enhances teamwork — so you can focus on generating insights.

The integrated Webservice (API) enables your to directly retrieve data from ReadStore via the terminal Command-Line-Interface (CLI) or Python / R SDKs.

The ReadStore Basic version provides a local webserver with a simple user management. If you need an organization-wide deployment, advanced user and group management or cloud integration please check the ReadStore Advanced versions and reach out to info@evo-byte.com.

Description

PyReadStore is a Python client (SDK) that lets you easily connect to your ReadStore server and interact with the ReadStore API. By importing the pyreadstore package in Python, you can quickly retrieve data from a ReadStore server.

This tool provides streamlined and standardized access to NGS datasets and metadata, helping you run analyses more efficiently and with fewer errors. You can easily scale your pipelines, and if you need to migrate or move NGS data, updating the ReadStore database ensures all your workflows stay up-to-date.

Security and Permissions

PLEASE READ AND FOLLOW THESE INSTRUCTIONS CAREFULLY!

User Accounts and Token

Using PyReadStore requires an active user account and a token (and a running ReadStore server).

You should never enter your user account password when working with PyReadStore.

To retrieve your token:

  1. Login to the ReadStore app via your browser
  2. Navigate to Settings page and click on Token
  3. You can regenerate your token anytime (Reset). This will invalidate the previous token

For uploading FASTQ files your user account needs to have Staging Permission. You can check this in the Settings page of your account. If you not have Staging Permission, ask your ReadStore server admin to grant you permission.

Setting Your Credentials

You need to provide the PyReadStore client with valid ReadStore credentials.

There are different options

  1. Load credentials from the ReadStore config file. The file is generated by the ReadStore CLI, by default in your home directory (~/.readstore/). Make sure to keep read permissions to the file restrictive

  2. Directly enter your username and token when instantiating a PyReadStore client within your Python code

  3. Set username and token via environment variables (READSTORE_USERNAME, READSTORE_TOKEN). This is useful in container or cloud environments.

Installation

pip3 install pyreadstore

You can perform the install in a conda or venv virtual environment to simplify package management.

A local install is also possible

pip3 install --user pyreadstore

import pyreadstore

ReadStore API

The ReadStore Basic server provides a RESTful API for accessing resources via HTTP requests.
This API extends the functionalities of the ReadStore CLI as well as the Python and R SDKs.

API Endpoint

By default, the API is accessible at:
http://127.0.0.1:8000/api_x_v1/

Authentication

Users must authenticate using their username and token via the Basic Authentication scheme.

Example Usage

Below is an example demonstrating how to use the ReadStore CLI to retrieve an overview of Projects by sending an HTTP GET request to the project/ endpoint.
In this example, the username is testuser, and the token is 0dM9qSU0Q5PLVgDrZRftzw. You can find your token in the ReadStore settings.

curl -X GET -u testuser:0dM9qSU0Q5PLVgDrZRftzw http://localhost:8000/api_x_v1/project/

Example Reponse

A successful HTTP response returns a JSON-formatted string describing the project(s) in the ReadStore database. Example response:

[{
  "id": 4,
  "name": "TestProject99",
  "metadata": {
    "key1": "value1",
    "key2": "value2"
  },
  "attachments": []
}]

Documentation

Comprehensive API documentation is available in the ReadStore Basic Docs.

Usage

Detailed tutorials, videos and explanations are found on YouTube or on the EVOBYTE blog.

Quickstart

Let's access some dataset and project data from the ReadStore database!

Make sure a ReadStore server is running and reachable (by default under 127.0.0.1:8000). You can enter (http://127.0.0.1:8000/api_v1/) in your browser and should get a response from the API.

We assume you ran readstore configure before to create a config file for your user. If not, consult the ReadStore CLI README on how to set this up.

We will create a client instance and perform some operations to retrieve data from the ReadStore database. More information on all available methods can be found below.

import pyreadstore

rs_client = pyreadstore.Client() # Create an instance of the ReadStore client

# Manage Datasets

datasets = rs_client.list()      # List all datasets and return pandas dataframe

datasets_project_1 = rs_client.list(project_id = 1) # List all datasets for project 1

datasets_id_25 = rs_client.get(dataset_id = 25)     # Get detailed data for dataset 25

# Manage Projects

projects = rs_client.list_projects()                # List all projects

projects = rs_client.get_project(project_name = 'MyProject') # Get details for MyProject

fastq_data_id_25 = rs_client.get_fastq(dataset_id = 25)     # Get fastq file data for dataset 25

rs_client.download_attachment(dataset_id = 25,              # Download files attached to dataset 25
                              attachment_name = 'gene_table.tsv') 

# Manage Processed Data

rs_client.upload_pro_data(name = 'sample_1_count_matrix',      # Set name of count matrix
                            pro_data_file = 'path/to/sample_1_counts.h5',   # Set file path
                            data_type = 'count_matrix',                     # Set type to 'count_matrix'
                            dataset_id = 25)                                # Set dataset id for upload

pro_data_project_1 = rs_client.list_pro_data(project_id = 1) # Get all ProData entries for Project 1

pro_data = rs_client.get_pro_data(name = 'sample_1_count_matrix',   # Set name to sample_1_count_matrix
                                dataset_id = 25)                    # dataset_id

pro_data_id = rs_client.delete_pro_data(name = 'sample_1_count_matrix',
                                        dataset_id = 25)

# Ingest FASTQ files

rs_client.upload_fastq(fastq = ['path/to_fastq_r1.fq', 'path/to_fastq_r2.fq'], # Upload a FASTQ files
                        fastq_name = ['sample_rep_1_r1', 'sample_rep_1_r2'],    # Set FASTQ names
                        read_type = ['R1', 'R2'])                               # Set individual FASTQ read types

Configure the Python Client

The Client is the central object and provides authentication against the ReadStore API. By default, the client will try to read the ~/.readstore/config credentials file. You can change the directory if your config file is located in another folder.

If you set the username and token arguments, the client will use these credentials instead.

If your ReadStore server is not running under localhost (127.0.0.1) port 8000, you can adapt the default settings.

pyreadstore.Client(config_dir: str = '~/.readstore',  # Directory containing ReadStore credentials
                  username: str | None = None,        # Username
                  token : str | None = None,          # Token
                  host: str = 'http://localhost',     # Hostname / IP of ReadStore server
                  return_type: str = 'pandas',        # Default return types, can be pandas or json
                  port: int = 8000,                   # Server Port Number
                  fastq_extensions: List[str] = ['.fastq','.fastq.gz','.fq','.fq.gz']) 
                  # Accepted FASTQ file extensions for upload validation 

Is is possible to set userame, token, server endpoint and fastq extensions using the listed environment variables. The enironment variables precede over other client configurations.

  • READSTORE_USERNAME (username)
  • READSTORE_TOKEN (token)
  • READSTORE_ENDPOINT_URL (http://host:post, e.g. http://localhost:8000)
  • READSTORE_FASTQ_EXTENSIONS (fastq_extensions, '.fastq',.fastq.gz,.fq,.fq.gz')

Possible errors

- Connection Error:     If no ReadStore server was found at the provided endpoint
- Authentication Error: If provided username or token are not found
- No Permission to Upload/Delete FASTQ/ProData: User has no [Staging Permissions]

Access Datasets

# List ReadStore Datasets

rs_client.list(project_id: int | None = None,   # Filter datasets for project with id `project_id`
              project_name: str | None = None,  # Filter datasets for project with name `project_name`
               return_type: str | None = None   # Return pd.DataFrame or JSON type
               ) -> pd.DataFrame | List[dict]

# Get ReadStore Dataset Details
# Provide dataset_id OR dataset_name

rs_client.get(dataset_id: int| None = None,     # Get dataset with id `dataset_id`
              dataset_name: str | None = None,  # Filter datasets with name `dataset_name`
              return_type: str | None = None    # Return pd.Series or json(dict)
              ) -> pd.Series | dict

# Get FASTQ file data for a dataset
# Provide dataset_id OR dataset_name

rs_client.get_fastq(dataset_id: int| None = None,    # Get fastq data for dataset with id `dataset_id`
                  dataset_name: str | None = None,   # Get fastq data for dataset `dataset_name`
                  return_type: str | None = None     # Return pd.Series or json(dict)
                  ) -> pd.DataFrame | List[dict]

# Return metadata for datasets in a dedicated pandas dataframe
# Metadata keys are pivoted as column, and values as rows 

rs_client.list_metadata(project_id: int | None = None,   # Subset by project_id
                        project_name: str | None = None  # Subset by project_name
                        ) -> pd.DataFrame:

Edit Datasets

NOTE Editing methods as create or delete require Staging Permission authorization.

When creating datasets, the name argument and metadata dictionary are checked for consistency: Each must not be empty, contain only alphanumeric characters (plus _-.@). Metadata keys must not contain reserved keywords (listed below).

# Create an empty Dataset, without FASTQ files attached

# Name must be unique in Database
# Optionally define Project IDs and/or Project names to attach Dataset to.  

rs_client.create(dataset_name: str,                       # Set name
                 description: str = '',           # Set description. Defaults to ''.
                 project_ids: List[int] = [],     # Set project_ids. Defaults to [].
                 project_names: List[str] = [],   # Set project_names. Defaults to [].
                 metadata: dict = {})              # Set metadata. Defaults to {}.

# Update a Dataset
# Dataset_id must be provided to define the dataset to update.
# Only arguments where a new values is specied will be updated.
# Argument with None value remain unaltered.

rs_client.update(dataset_id: int,                 # Set ID to update
                dataset_name: str | None = None,  # Updated name (optional)
                description: str | None = None,   # Updated description (optional)
                project_ids: List[int] | None = None,   # Updated project_ids (optional)
                project_names: List[str] | None = None, # Updated project_names (optional)
                metadata: dict | None = None,           # Updated metadata (optional)

# Provide empty project_ids or project_names list [] to unset all associated projects

# Delete Dataset (and attached FASTQ files)
# Either dataset_id or dataset_name argument must be provided

rs_client.delete(dataset_id: int | None = None,   # Delete by ID. Defaults to None.
                 dataset_name: str | None = None)  # Delete by Name. Defaults to None.

Access Projects

# List ReadStore Projects

rs_client.list_projects(return_type: str | None = None   # Return pd.DataFrame or JSON type
                        ) -> pd.DataFrame | List[dict]

# Get ReadStore Project Details
# Provide project_id OR project_name

rs_client.get_project(project_id: int| None = None,     # Get dataset with id `project_id`
                      project_name: str | None = None,  # Filter datasets with name `project_name`
                      return_type: str | None = None    # Return pd.Series or json(dict)
                      ) -> pd.Series | dict

# Return metadata for datasets in a dedicated pandas dataframe
# Metadata keys are pivoted as column, and values as rows 

rs_client.list_projects_metadata() -> pd.DataFrame:

Edit Projects

NOTE Editing methods as create or delete require Staging Permission authorization.

When creating datasets, the name argument and metadata dictionary are checked for consistency: Each must not be empty, contain only alphanumeric characters (plus _-.@). Metadata keys must not contain reserved keywords (listed below).

# Create ReadStore Project

# name must be unique in Database
# dataset_metadata_keys can be attached and will be set as default metadata keys for attached datasets

rs_client.create_project(project_name: str,                       # Set Project name
                         description: str = '',           # Set Project description. Defaults to ''.
                         metadata: dict = {},             # Set Project metadata. Defaults to {}.
                         dataset_metadata_keys: List[str] = [])  # Set dataset metadata keys. Defaults to [].

# Update a Project
# Project_id must be provided to define the project to update.
# Only arguments where a new values is specied will be updated.
# Argument with None value remain unaltered.

rs_client.update_project(project_id: int,                # Set project id to update
                         project_name: str | None = None, # Updated name (optional)
                         description: str | None = None,  # Updated description (optional)
                         metadata: dict | None = None,    # Updated metadata (optional)
                         dataset_metadata_keys: List[str] | None = None) # Updated metadata keys (optional)

# Delete ReadStore Project
# Either project_id or project_name argument must be provided

rs_client.delete_project(project_id: int | None = None,    # Delete by ID. Defaults to None.
                         project_name: str | None = None)  # Delete by Name. Defaults to None.

Access Processed Data

# Upload Processed Data

rs_client.upload_pro_data(name: str,                # Name of ProData
                        pro_data_file: str,         # Set ProData file path
                        data_type: str,             # Set ProData data type
                        description: str = '',      # Description for ProData
                        metadata: dict = {},        # MetaData
                        dataset_id: int | None = None,  # Dataset ID to assign ProData to
                        dataset_name: str | None = None)# Dataset Name to assign ProData to

# Must provide dataset_id or dataset_name

# List and filter Processed Data

rs_client.list_pro_data(project_id: int | None = None,      # Filter by Project ID
                        project_name: str | None = None,    # Filter by Project Name
                        dataset_id: int | None = None,      # Filter by Dataset ID
                        dataset_name: str | None = None,    # Filter by Dataset Name
                        name: str | None = None,            # Filter by ProData name
                        data_type: str | None = None,       # Filter by ProData data type
                        include_archived: bool = False,     # Include archived
                        return_type: str | None = None) -> pd.DataFrame | List[dict]

# Get individual ProData entry

rs_client.get_pro_data(pro_data_id: int | None = None,  # Get ProData by ID
                        dataset_id: int | None = None,  # Get ProData by Dataset ID
                        dataset_name: str | None = None, # Get ProData by Dataset Name
                        name: str | None = None,        # Get ProData by Name ID
                        version: int | None = None,     # Get specific verion, None returns latest valid version
                        return_type: str | None = None) -> pd.Series | dict

# Provide ID or Name + Dataset ID/Name

# Get metadata from ProData entries

rs_client.list_pro_data_metadata(project_id: int | None = None, # Subset by project ID
                                project_name: str | None = None, # Subset by project name
                                dataset_id: int | None = None,   # Subset by Dataset ID
                                dataset_name: str | None = None, # Subset by Dataset Name
                                name: str | None = None,         # Subset by ProData Name
                                data_type: str | None = None,    # Subset by ProData Type
                                include_archived: bool = False  # Include Archived entries
                                ) -> pd.DataFrame

# Delete ProData entry

rs_client.delete_pro_data(pro_data_id: int | None = None,   # Delete by ProData ID
                        dataset_id: int | None = None,      # Delete by Dataset ID
                        dataset_name: str | None = None,    # Delete by Dataset Name
                        name: str | None = None,            # Delete by name
                        version: int | None = None):        # Delete specific version

# Provide ID or Name + Dataset ID/Name for delete

Download Attachments

# Download project attachment file from ReadStore Database 

rs_client.download_project_attachment(attachment_name: str,            # name of attachment file
                                      project_id: int | None = None,   # project id with attachment
                                      project_name: str | None = None, # project name with attachment
                                      outpath: str | None = None)      # Path to download file to

# Download dataset attachment file from ReadStore Database 

rs_client.download_attachment(attachment_name: str,             # name of attachment file
                              dataset_id: int | None = None,    # datatset id with attachment
                              dataset_name: str | None = None,  # datatset name with attachment
                              outpath: str | None = None)       # Path to download file to

Upload FASTQ files

Upload FASTQ files to ReadStore server. The methods checks if the FASTQ files exist and end with valid FASTQ ending.

# Upload FASTQ files to ReadStore 

rs_client.upload_fastq(fastq : List[str] | str)  # Path of FASTQ files to upload

Reserved keywords

The following keywords must not be used as metadata keys

'id','name','project','project_ids','project_names','owner_group_name','qc_passed','paired_end',
'index_read','created','description','owner_username','fq_file_r1','fq_file_r2','fq_file_i1',
'fq_file_i2','id_project','name_project','name_og','archived','collaborators','dataset_metadata_keys',
'data_type','version','valid_to','upload_path','owner_username','fq_dataset','id_fq_dataset','name_fq_dataset'

Contributing

Contributions make this project better! Whether you want to report a bug, improve documentation, or add new features, any help is welcomed!

How You Can Help

  • Report Bugs
  • Suggest Features
  • Improve Documentation
  • Code Contributions

Contribution Workflow

  1. Fork the repository and create a new branch for each contribution.
  2. Write clear, concise commit messages.
  3. Submit a pull request and wait for review.

Thank you for helping make this project better!

License

The pyreadstore is licensed under an Apache 2.0 Open Source License. See the LICENSE file for more information.

Credits and Acknowledgments

pyreadstore is built upon the following open-source python packages and would like to thank all contributing authors, developers and partners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyreadstore-1.3.2.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyreadstore-1.3.2-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file pyreadstore-1.3.2.tar.gz.

File metadata

  • Download URL: pyreadstore-1.3.2.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pyreadstore-1.3.2.tar.gz
Algorithm Hash digest
SHA256 71297095b6cfb5a43fc6678841aff58159ec612e7311205323ce9b6d54fafb66
MD5 27c1c27959c6312b8abb704292531ee7
BLAKE2b-256 ad415525f3067a6a69523703d5785b5254c91dedf4e93b5a3361d7963238b925

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyreadstore-1.3.2.tar.gz:

Publisher: release-main-publish.yml on EvobyteDigitalBiology/pyreadstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyreadstore-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: pyreadstore-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pyreadstore-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c42e453476afeb31187a484b8932f5e21e95ef6b9f04767e9f76fc48b211992a
MD5 eae964172a9c91e7009e18c4312f1967
BLAKE2b-256 eb96acdaa7a476230587e9d5aba6c4a1a116ecf97e2f487fe63a5b0eb29d979f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyreadstore-1.3.2-py3-none-any.whl:

Publisher: release-main-publish.yml on EvobyteDigitalBiology/pyreadstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page