Skip to main content

CLI tool and SDK for interacting with the Cirro platform

Project description

Cirro Client

Build Python package Lint and run tests Quality Gate Status

A Python 3.10+ library for the Cirro platform.

Installation

You can install Cirro using pip:

pip install cirro

or you can install the main branch of the repo by running:

pip install git+https://github.com/CirroBio/Cirro-client.git

To enable pipeline configuration you need to install extras using:

pip install cirro[nextflow]      # just nextflow pipeline configuration support
pip install cirro[wdl]           # just wdl pipeline configuraiton support
pip install cirro[nextflow,wdl]  # both nextflow and wdl pipeline configuration support

NOTE: Configuring Nextflow pipelines also requires a local installation of nextflow.

Authentication

Upon first use, the Cirro client will ask you what Cirro instance to use and if you would like to save your login information. It will then give you a link to authenticate through the web browser.

You can change your Cirro instance by running cirro configure and selecting the desired instance.

If you need to change your credentials after this point, and you've opted to save your login, please see the clearing saved login section.

Command Line Usage

Downloading a dataset:

Usage: cirro download [OPTIONS]

  Download dataset files

Options:
  --project TEXT         Name or ID of the project
  --dataset TEXT         ID of the dataset
  --file TEXT            Relative path of the file(s) to download (optional, can be used multiple times)
  --data-directory TEXT  Directory to store the files
  -i, --interactive      Gather arguments interactively
  --help                 Show this message and exit.
$ cirro download --project "Test Project 1" --dataset "test" --data-directory "~/download"

Uploading a dataset:

Usage: cirro upload [OPTIONS]

  Upload and create a dataset

Options:
  --name TEXT                  Name of the dataset
  --description TEXT           Description of the dataset (optional)
  --project TEXT               Name or ID of the project
  --data-type, --process TEXT  Name or ID of the data type (--process is deprecated)
  --data-directory TEXT        Directory you wish to upload
  --file TEXT                  Relative path of the file(s) to upload (optional, can be used multiple times)
  -i, --interactive            Gather arguments interactively
  --include-hidden             Include hidden files in the upload (e.g., files starting with .)
  --help                       Show this message and exit.
$ cirro upload --project "Test Project 1" --name "test" --file "sample1.fastq.gz" --file "sample2.fastq.gz" --data-directory "~/data" --data-type "Paired DNAseq (FASTQ)" 

Validating that a dataset matches a local folder

Usage: cirro validate [OPTIONS]

  Validate that the contents of a local folder match those of a dataset in Cirro

Options:
  --dataset TEXT               Name or ID of the dataset
  --project TEXT               Name or ID of the project
  --data-directory TEXT        Local directory you wish to validate
  -i, --interactive            Gather arguments interactively
  --help                       Show this message and exit.
$ cirro validate --project "Test Project 1" --dataset "test" --data-directory "~/data"

Uploading a reference

Usage: cirro upload-reference [OPTIONS]

  Upload a reference to a project

Options:
  --name TEXT            Name of the reference
  --reference-type TEXT  Type of the reference (e.g., Reference Genome (FASTA))
  --project TEXT         Name or ID of the project
  --reference-file TEXT  Location of reference file(s) to upload (can be used multiple times)
  -i, --interactive      Gather arguments interactively
  --help                 Show this message and exit.

Listing datasets:

Usage: cirro list-datasets [OPTIONS]

  List available datasets

Options:
  --project TEXT         ID of the project
  -i, --interactive      Gather arguments interactively
  --help                 Show this message and exit.

Configuring a pipeline

Usage: cirro create-pipeline-config [OPTIONS]

  Create pipeline configuration files

Options:
  -p, --pipeline-dir DIRECTORY  Directory containing the pipeline definition
                                files (e.g., WDL or Nextflow)  [default: .]
  -e, --entrypoint TEXT         Entrypoint WDL file (optional, if not
                                specified, the first WDL file found will be
                                used). Ignored for Nextflow pipelines.
  -o, --output-dir TEXT         Directory to store the generated configuration
                                files  [default: .cirro]
  -i, --interactive             Gather arguments interactively
  --help                        Show this message and exit.

It is highly recommended that:

  • Nextflow pipelines utilize a nextflow_schema.json file. (If your pipeline originates from NF-Core, this should already be the case.)
  • WDL pipelines are defined in WDL v1.0 or higher and explicitly define an input section in the root-level workflow.

Interactive Commands

When running a command, you can specify the --interactive flag to gather the command arguments interactively.

Example:

$ cirro upload --interactive
? What project is this dataset associated with?  Test project
? Enter the full path of the data directory  /shared/biodata/test
? Please confirm that you wish to upload 20 files (0.630 GB)  Yes
? What type of files?  Illumina Sequencing Run
? What is the name of this dataset?  test
? Enter a description of the dataset (optional)

Python Usage

See the following set of Jupyter notebooks that contain examples on the following topics:

Jupyter Notebook Topic
Introduction Installing and authenticating
Uploading a dataset Uploading data
Downloading a dataset Downloading data
Interacting with a dataset Calling data and reading into tables
Analyzing a dataset Running analysis pipelines
Using references Managing reference data
Advanced usage Advanced operations

Reading files

The read_file and read_files methods provide a convenient way to read dataset files directly into Python objects. The file format is inferred from the extension (.csv, .tsv, .json, .parquet, .feather, .pkl, .xlsx, .h5ad), or can be specified explicitly.

from cirro import DataPortal

# If not logged in, this will prompt with a login URL
portal = DataPortal()

# Read a single file from the indicated dataset
df = portal.read_file(project="My Project", dataset="My Dataset", glob="**/results.csv")

# Iterate over each of the files ending in .csv within a dataset
for df in portal.read_files(project="My Project", dataset="My Dataset", glob="*.csv"):
    print(df.shape)

You can also call these methods on the DataPortalDataset object:

# Get an object representing a single dataset
dataset = portal.get_dataset(project="My Project", dataset="My Dataset")

# Read a single file by exact path or glob pattern
df = dataset.read_file(path="data/results.csv")
df = dataset.read_file(glob="**/results.csv")

# Read multiple files matching a pattern — yields one result per file
for df in dataset.read_files(glob="**/*.csv"):
    print(df.shape)

# Extract values from the path using {name} capture placeholders
for df, meta in dataset.read_files(pattern="{sample}/results.csv"):
    print(meta["sample"], df.shape)

# Extra keyword arguments are forwarded to the file-parsing function
for df in dataset.read_files(glob="**/*.tsv.gz", filetype="csv", sep="\t"):
    print(df.shape)

R Usage

Jupyter Notebook Topic
Downloading a dataset in R Reading data with R

Advanced Usage

View the API documentation for this library here.

Supported environment variables

Name Description Default
CIRRO_HOME Local configuration directory ~/.cirro
CIRRO_BASE_URL Base URL of the data portal

Configuration

The cirro configure command creates a file in CIRRO_HOME called config.ini.

You can set the base_url property in the config file rather than using the environment variable.

The transfer_max_retries configuration property specifies the maximum number of times to attempt uploading a file to Cirro in the event of a transfer failure. When uploading files to Cirro, network issues or temporary outages can occasionally cause a transfer to fail. It will pause for an increasing amount of time for each retry attempt.

The default hashing algorithm for files is CRC64. In many cases, CRC64 is sufficient to ensure data integrity upon upload.

[General]
base_url = cirro.bio
transfer_max_retries = 15

Clearing saved login

You can clear your saved login information by removing the ~/.cirro/token.dat file from your system or by running cirro configure and selecting No when it asks if you'd like to save your login information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cirro-1.10.3.tar.gz (70.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cirro-1.10.3-py3-none-any.whl (96.8 kB view details)

Uploaded Python 3

File details

Details for the file cirro-1.10.3.tar.gz.

File metadata

  • Download URL: cirro-1.10.3.tar.gz
  • Upload date:
  • Size: 70.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.14.3 Linux/6.14.0-1017-azure

File hashes

Hashes for cirro-1.10.3.tar.gz
Algorithm Hash digest
SHA256 531857273e051d196f1fa91dcedcc693e8cf18974cdb41900082e24fe00aeea5
MD5 ca07ed98b5b943ae51e0303b16d62a03
BLAKE2b-256 4b9a83223689b4626b22466024ae84d1f18d1f2364f9f355cc508a7bbf49f737

See more details on using hashes here.

File details

Details for the file cirro-1.10.3-py3-none-any.whl.

File metadata

  • Download URL: cirro-1.10.3-py3-none-any.whl
  • Upload date:
  • Size: 96.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.14.3 Linux/6.14.0-1017-azure

File hashes

Hashes for cirro-1.10.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9df5af03279ba381d37f92766d629b17510a8c8e0c1c666b2c1a38fbf756043c
MD5 cf64be322973b320071e412b8424a308
BLAKE2b-256 8049344d08cc2b48c7a1874bb23ee5108c2a2c883175e217df00f7e9c5097d84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page