Python client for CellBase
Project description
PyCellBase
PyCellBase is a Python package that provides programmatic access to the comprehensive RESTful web service API that has been implemented for the CellBase database, providing an easy, lightweight, fast and intuitive access to it.
This package can be used to access to relevant biological information in a user-friendly way without the need of local databases installations.
Data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance.
PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications.
More info about this package in the Python client section of the CellBase Wiki
Installation
Cloning
PyCellBase can be cloned in your local machine by executing in your terminal:
$ git clone https://github.com/opencb/cellbase.git
Once you have downloaded the project you can install the library:
$ cd cellbase/clients/python $ python setup.py install
PyPI
PyCellBase is stored in PyPI and can be installed via pip:
$ pip install pycellbase
Usage
Getting started
The first step is to import the module and initialize the CellBaseClient:
>>> from pycellbase.cbclient import CellBaseClient
>>> cbc = CellBaseClient()
The second step is to create the specific client for the data we want to query (in this example we want to obtain information for a gene):
>>> gc = cbc.get_gene_client()
And now, you can start asking to the CellBase RESTful service by providing a query ID:
>>> tfbs_responses = gc.get_tfbs('BRCA1') # Obtaining TFBS for this gene
Responses are retrieved as JSON formatted data. Therefore, fields can be queried by key:
>>> tfbs_responses = gc.get_tfbs('BRCA1')
>>> tfbs_responses[0]['result'][0]['tfName']
'E2F4'
>>> transcript_responses = gc.get_transcript('BRCA1')
>>> 'Number of transcripts: %d' % (len(transcript_responses[0]['result']))
'Number of transcripts: 27'
>>> for tfbs_response in gc.get_tfbs('BRCA1,BRCA2,LDLR'):
... print('Number of TFBS for "%s": %d' % (tfbs_response['id'], len(tfbs_response['result'])))
'Number of TFBS for "BRCA1": 175'
'Number of TFBS for "BRCA2": 43'
'Number of TFBS for "LDLR": 141'
Data can be accessed specifying comma-separated IDs or a list of IDs:
>>> tfbs_responses = gc.get_tfbs('BRCA1')
>>> len(tfbs_responses)
1
>>> tfbs_responses = gc.get_tfbs('BRCA1,BRCA2')
>>> len(tfbs_responses)
2
>>> tfbs_responses = gc.get_tfbs(['BRCA1', 'BRCA2'])
>>> len(tfbs_responses)
2
If there is an available resource in the CellBase Webservices, but there is not an available method in this python package, the CellBaseClient can be used to create the URL of interest and query the RESTful service:
>>> tfbs_responses = cbc.get(category='feature', subcategory='gene', query_id='BRCA1', resource='tfbs')
>>> tfbs_responses[0]['result'][0]['tfName']
'E2F4'
Optional filters and extra options can be added as key-value parameters (value can be a comma-separated string or a list):
>>> tfbs_responses = gc.get_tfbs('BRCA1')
>>> len(res[0]['result'])
175
>>> tfbs_responses = gc.get_tfbs('BRCA1', include='name,id') # Return only name and id
>>> len(res[0]['result'])
175
>>> tfbs_responses = gc.get_tfbs('BRCA1', include=['name', 'id']) # Return only name and id
>>> len(res[0]['result'])
175
>>> tfbs_responses = gc.get_tfbs('BRCA1', **{'include': 'name,id']) # Return only name and id
>>> len(res[0]['result'])
175
>>> tfbs_responses = gc.get_tfbs('BRCA1', limit=100) # Limit to 100 results
>>> len(res[0]['result'])
100
>>> tfbs_responses = gc.get_tfbs('BRCA1', skip=100) # Skip first 100 results
>>> len(res[0]['result'])
75
What can I ask for?
The best way to know which data can be retrieved for each client is either checking out the RESTful web services section of the CellBase Wiki or the CellBase web services
Configuration
Configuration stores the REST services host, API version and species.
Getting the default configuration:
>>> ConfigClient().get_default_configuration()
{'version': 'v4',
'species': 'hsapiens',
'rest': {'hosts': ['http://bioinfo.hpc.cam.ac.uk:80/cellbase']}}
Showing the configuration parameters being used at the moment:
>>> cbc.show_configuration()
{'host': 'bioinfo.hpc.cam.ac.uk:80/cellbase',
'version': 'v4',
'species': 'hsapiens'}
A custom configuration can be passed to CellBaseClient using a ConfigClient object. JSON and YAML files are supported:
>>> from pycellbase.cbconfig import ConfigClient
>>> from pycellbase.cbclient import CellBaseClient
>>> cc = ConfigClient('config.json')
>>> cbc = CellBaseClient(cc)
A custom configuration can also be passed as a dictionary:
>>> from pycellbase.cbconfig import ConfigClient
>>> from pycellbase.cbclient import CellBaseClient
>>> custom_config = {'rest': {'hosts': ['bioinfo.hpc.cam.ac.uk:80/cellbase']}, 'version': 'v4', 'species': 'hsapiens'}
>>> cc = ConfigClient(custom_config)
>>> cbc = CellBaseClient(cc)
If you want to change the configuration on the fly you can directly modify the ConfigClient object:
>>> cc = ConfigClient()
>>> cbc = CellBaseClient(cc)
>>> cbc.show_configuration()['version']
'v4'
>>> cc.version = 'v3'
>>> cbc.show_configuration()['version']
'v3'
Use case
A use case where PyCellBase is used to obtain multiple kinds of data from different sources can be found in this Jupyter Notebook
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pycellbase-4.5.2.tar.gz
.
File metadata
- Download URL: pycellbase-4.5.2.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42f270dc1aecac4b8c4a30b9ba307636995099dff0d3d9b8420cef04fc5fbb6e |
|
MD5 | da65b57b4e8fe44b4fc5a1c4fbab7161 |
|
BLAKE2b-256 | c10dbe6c45164b92e7f0a65cf94cec0c3623d3d1ed0194cab359dde0364a7064 |
Provenance
File details
Details for the file pycellbase-4.5.2-py2.py3-none-any.whl
.
File metadata
- Download URL: pycellbase-4.5.2-py2.py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84773b837203fab06c60be68191cdd8ef78ca02cc7afd05cb57260a2b6b63d42 |
|
MD5 | ae8a5d8883cd44c9cb1b6a3ebf276808 |
|
BLAKE2b-256 | 0f2f823d4bea8e0c84cb2c06d6d2dc973969e965fd7191adc75928a5f4879a98 |