Skip to main content

A REST client for OpenCGA web services

Project description

PyCGA

  • This Python package makes use of the exhaustive RESTful Web service API that has been implemented for the OpenCGA database.

  • It provides easy access to OpenCGA, an open-source project that aims to provide a Big Data storage engine and analysis framework for genomic scale data analysis of hundreds of terabytes or even petabytes.

  • More info about this project in the OpenCGA Wiki

Installation

Cloning

PyCGA can be cloned in your local machine by executing in your terminal:

$ git clone https://github.com/opencb/opencga.git

Once you have downloaded the project you can install the library:

$ cd opencga/tree/develop/opencga-client/src/main/python
$ python setup.py install

Usage

Getting started

The first step is to set up the OpenCGA server configuration:

>>> configuration = {
        "version": "v1",
        "rest": {
            "hosts": ["http://100.15.26.35:8080/opencga"]
        }
    }

The configuration can be stored in a JSON or YML file as well:

>>> configuration = '/path/to/config/opencga_configuration.json'

The second step is to import the module and initialize the OpenCGAClient. Configuration, user and password must be specified:

>>> from pyCGA.opencgarestclients import OpenCGAClient
>>> oc = OpenCGAClient(configuration=configuration, user='user_example', pwd='pass_example')

If user and password are not desired to be written down in a script, session id can be used instead:

>>> from pyCGA.opencgarestclients import OpenCGAClient
>>> oc = OpenCGAClient(configuration=configuration, user='user_example', pwd='pass_example')  # Remove after getting session id
>>> print oc.session_id  # Remove after getting session id
"I4MG3fXJIZARl1LhwZ"
>>> oc = OpenCGAClient(configuration=configuration, session_id='I4MG3fXJIZARl1LhwZ')

The next step is to create the specific client for the data we want to query:

>>> samples = oc.samples()  # Query for samples
>>> files = oc.files()  # Query for files
>>> cohorts = oc.cohorts()  # Query for cohorts

Now you can start asking to the OpenCGA RESTful service by providing a query ID:

>>> sample_search = samples.search(study='study1', name='sample1').get()
>>> print sample_search
"[{'acl': [{'member': '@gel', u'permissions': ['VIEW', 'VIEW_ANNOTATIONS']}..."

Responses are retrieved as JSON formatted data. Therefore, fields can be queried by key:

>>> creation_date = oc.samples.search(study='study1', name='sample1').get()[0]['creationDate']
"20170204822738"

First levels in the JSON output can be accessed as attributes:

>>> creation_date = samples.search(study='study1', name='sample1').get().creationDate
"20170204122738"

>>> annotation = cohorts.search(study='study1', name='cohort1').get().annotationSets
>>> print annotation[0]['annotations'][0]['value']['sex']
"F"

Regex are allowed in some fields. This is specially useful when searching by name:

>>> cohort_name = cohorts.search(study=study_id, name='~LP3000506-DNA_J01').get().name
>>> print cohort_name
"LP3000506-DNA_J01_LP3000924-DNA_Z02_0"

Data can be accessed specifying comma-separated IDs or a list of IDs:

>>> creation_date = oc.samples.search(study='study1', name='sample1').get()[0]['creationDate']
"20170204822738"

>>> creation_date = oc.samples.search(study='study1', name='sample1').get()[1]['creationDate']
"20170204822738"

>>> creation_date = samples.search(study='study1', name='sample1,sample2').get().creationDate
["20170204122738", "20170204123049"]

Optional filters and extra options can be added as key-value parameters (value can be a comma-separated string or a list):

>>> # e.g. "exclude" parameter
>>> attributes = oc.files.search(study='study1', name='~sample', bioformat='VARIANT', status='READY', exclude='attributes').get().attributes
>>> print attributes
[{}, {}, {}, {}, {}, {}, {}, {}]

>>> # e.g. "limit" parameter
>>> files = oc.files.search(study='study1', name='~sample', bioformat='VARIANT', status='READY', limit=1).get()
>>> print len(files)
1

Special mention for “analysis_variant” endpoint, which returns an iterator:

>>> variant_iterator = oc.analysis_variant.query(pag_size=100, data={'studies': 'study1', 'gene': 'BRCA2'}, limit=1)
>>> for variant in var_iterator:
>>>     print v.get().type
"SNV"

What can I ask for?

The best way to know which data can be retrieved for each client is either checking out the RESTful web services section of the OpenCGA Wiki or the OpenCGA web services

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyCGA-1.3.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

pyCGA-1.3.0-py2-none-any.whl (22.3 kB view details)

Uploaded Python 2

File details

Details for the file pyCGA-1.3.0.tar.gz.

File metadata

  • Download URL: pyCGA-1.3.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pyCGA-1.3.0.tar.gz
Algorithm Hash digest
SHA256 89d90352167d3b170ee0b0d9455930ebe0f7a7a6fe01c1b4bb566470797f7692
MD5 d8f5cb2155cf3a05234d4fdad04e6d8f
BLAKE2b-256 4b5953bcf899d80fce08be9e092c4db7e5ca261195f9bef066addb383b56f4b5

See more details on using hashes here.

File details

Details for the file pyCGA-1.3.0-py2-none-any.whl.

File metadata

File hashes

Hashes for pyCGA-1.3.0-py2-none-any.whl
Algorithm Hash digest
SHA256 7c860b51d4eb7bf729f904a595ac2343bb8b21f3e96f236a6955c5d07d6e80bc
MD5 9b6b79c273c8d092dae0a548840dcbaa
BLAKE2b-256 12d41ed3b2834d168cca55635aeb0df165d3ffbcbf12a1c143618a96d62e05df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page