This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

The increasing size of datasets used in scientific computing has made it difficult or impossible for a researcher to store all their data at the compute site they are using to process it. This has necessitated that a data transfer step become a key consideration in experimental design. Accordingly, scientific data repositories such as NCBI have begun to offer services such as dedicated data transfer machines and advanced transfer clients. Despite this, many researchers continue familiar but suboptimal practices: using slow transfer clients like a web browser or scp, transferring data over wireless networks, etc.

BDSS aims to alleviate this problem by shifting the burden of learning about alternative file mirrors, transfer clients, tuning parameters, etc. from the end user researcher to a group of “data curators”. It consists of three parts:

Components

  • Metadata repository
  • Central database managed by data curators
  • Matches patterns of data file URLs and maps them to alternate sources
  • Includes information about the transfer tool to use to retrieve the data
  • BDSS transfer client
  • Consumes information from metadata repository
  • Invokes transfer tools
  • Reports analytics to metadata repository
  • Integration as a Galaxy data transfer tool

Get Started

Examples

All examples here require a metadata repository configured to support them. The default metadata repository at http://bdss.bioinfo.wsu.edu/ supports these examples and the necessary configuration is also listed here.

NCBI SRA archive

NCBI makes files available for transfer using Aspera Connect, a tool with “improved data transfer characteristics” vs FTP or HTTP. If ascp is installed on your machine, BDSS can handle building the appropriate command.

Without BDSS:

ascp -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh -T anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR039/SRR039885/SRR039885.sra ./

With BDSS:

bdss transfer -u 'ftp://ftp.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR039/SRR039885/SRR039885.sra'

Metadata repository configuration:

{
  "data_sources": [
    {
      "description": "",
      "label": "NCBI Sequence Read Archive with FTP",
      "test_files": [],
      "transfer_mechanism": {
        "options": {},
        "type": "curl"
      },
      "transforms": [
        {
          "for_destinations": [],
          "options": {
            "new_scheme": "aspera"
          },
          "target": "NCBI Sequence Read Archive with Aspera",
          "type": "change_scheme"
        }
      ],
      "url_matchers": [
        {
          "options": {
            "pattern": "^ftp://ftp\\.ncbi\\.nlm\\.nih\\.gov/sra"
          },
          "type": "regular_expression"
        }
      ]
    },
    {
      "description": "",
      "label": "NCBI Sequence Read Archive with Aspera",
      "test_files": [],
      "transfer_mechanism": {
        "options": {
          "disable_encryption": true,
          "username": "anonftp"
        },
        "type": "aspera"
      },
      "transforms": [],
      "url_matchers": [
        {
          "options": {
            "pattern": "^aspera://ftp\\.ncbi\\.nlm\\.nih\\.gov/sra"
          },
          "type": "regular_expression"
        }
      ]
    }
  ],
  "destinations": []
}

JGI Genome Portal

To download files from the JGI Genome Portal, you must first authenticate. BDSS can prompt for credentials and handle storing your session cookies.

Without BDSS:

curl 'https://signon.jgi.doe.gov/signon/create' --data-urlencode 'login=USER_NAME' --data-urlencode 'password=USER_PASSWORD' -c cookies > /dev/null
curl 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=PhytozomeV10' -b cookies > get-directory

With BDSS:

bdss transfer -u 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=PhytozomeV10'
JGI Genome Portal username?USER_NAME
JGI Genome Portal password?USER_PASSWORD

Metadata repository configuration:

{
  "data_sources": [
    {
      "description": "",
      "label": "JGI Genome Portal",
      "test_files": [],
      "transfer_mechanism": {
        "options": {
          "auth_url": "https://signon.jgi.doe.gov/signon/create",
          "password_field": "password",
          "password_prompt": "JGI Genome Portal password?",
          "username_field": "login",
          "username_prompt": "JGI Genome Portal username?"
        },
        "type": "session_authenticated_curl"
      },
      "transforms": [],
      "url_matchers": [
        {
          "options": {
            "pattern": "http:\\/\\/genome\\.jgi\\.doe\\.gov\\/ext-api"
          },
          "type": "regular_expression"
        }
      ]
    }
  ],
  "destinations": []
}
Release History

Release History

1.0.1b2

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.1b1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0b1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
bdss_client-1.0.1b2-py3-none-any.whl (50.1 kB) Copy SHA256 Checksum SHA256 py3 Wheel Nov 20, 2016
bdss_client-1.0.1b2.tar.gz (20.2 kB) Copy SHA256 Checksum SHA256 Source Nov 20, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting