A Python client for streamlined execution of CDD Vault API methods.
Project description
CDD-Python-SDK
A Python client for streamlined execution of CDD Vault API methods.
NOTE:
This UNOFFICIAL package was created with the express permission of Collaborative Drug Discovery Inc., but is Licenced by Workflow Informatics Corp.
Please contact Workflow Informatics Corp. for communications regarding this package.
- CDD-Python-SDK - A Python client for streamlined execution of CDD Vault API methods. - NOTE: - This UNOFFICIAL package was created with the express permission of Collaborative Drug Discovery Inc., but is Licenced by Workflow Informatics Corp. - Please contact Workflow Informatics Corp. for communications regarding this package.
- Known Issues
- Installation
- Getting Started
- VaultClient Attributes
- VaultClient Methods
- Control/Misc.
- Set the vault ID and construct the base URL, from which endpoints for all subsequent API calls (GET, POST, PUT, DELETE) will be constructed.
- Set the API token credentials, which will be passed in the request header to CDD Vault with each API request.
- Set the 'maxSyncObjects' attribute, which is used to determine when a synchronous vs asynchronous export request is submitted to CDD. If the # of objects returned from a GET request is ever
>=
maxSyncObjects, the call will be repeated asynchronously.
- Batches
- Molecules
- Public Data-Sets
- ELN Entries
- Fields
- Files
- Mapping Templates
- Plates
- Plot
- Projects
- Protocols
- Protocol Data
- Readout Rows
- Runs
- Slurps
- Batch Move Job(s)
- Control/Misc.
Known Issues
- Molecules: finish adding help documentation for query parameters.
Installation
The latest version of the CDD Python SDK is located here and can be installed using pip:
pip install cdd-python-sdk
Getting Started
- Import the
VaultClient
module:
from cdd.VaultClient import VaultClient
- Confirm your User Permissions, then instantiate a VaultClient to work with your data:
vaultNum = 4598 # Insert your unique vault ID here.
apiToken = os.environ["cddAPIToken"] # Insert your API token here.
vault = VaultClient(vaultNum, apiToken)
- Use the provided methods and properties to download, upload, and edit data:
projects_dataframe = vault.getProjects() # default response is pandas dataframe
protocols_json = vault.getProtocols(asDataFrame=False)
filtered_protocols = vault.getProtocols(projects = projects_dataframe.at[0, 'id'])
- A full list of valid parameters can be returned by passing
help=True
vault.getMolecules(help=True)
VaultClient Attributes
self.URL
Returns the URL assciated with the active VaultClient instance
self.vaultNum
Returns the four-digit vault ID associated with the active VaultClient instance
self.apiKey
Returns the API Key associated with the active VaultClient instance
self.maxSyncObjects
Returns the current value of the maxSyncObjects attribute
VaultClient Methods
Note: Additional methods are defined for VaultClient, but are not intended to be called by the end-user. However, developers are encouraged to check the docstrings within those methods.
Control/Misc.
Set the vault ID and construct the base URL, from which endpoints for all subsequent API calls (GET, POST, PUT, DELETE) will be constructed.
setVaultNumAndURL(vaultNum)
Returns: tuple
a two-element tuple consisting of the vault ID and the base URL for accessing the CDD Vault API.
Set the API token credentials, which will be passed in the request header to CDD Vault with each API request.
setAPIKey(apiKey)
Note that the API token must have read/write access to the vault specified by the vault ID when executing the various API calls or an error will be returned.
Returns: str
Set the 'maxSyncObjects' attribute, which is used to determine when a synchronous vs asynchronous export request is submitted to CDD. If the # of objects returned from a GET request is ever >=
maxSyncObjects, the call will be repeated asynchronously.
setMaxSyncObjects(value=1000)
Defaults to 1000, the maximum # of objects which a CDD GET request can return synchronously.
Only used in methods where GET requests can be performed asynchronously:
Molecules, Batches, Plates, Protocols, and Protocol Data. See method sendSyncAndAsyncGets().
Returns: int
Batches
Return a set or subset of batches from CDD vault.
getBatches(asDataFrame=True, help=False, **kwargs)
- asDataFrame
bool
returns the json as a Pandas DataFrame.
Additional Valid Arguments:
"batches": "Comma-separated list of ids. Cannot be used with other parameters"
"no_structures": "Boolean. If true, omit structure representations for a smaller and faster response. Default: false",
"only_ids": "Boolean. If true, only the Batch IDs are returned, allowing for a smaller and faster response. Default: false",
"created_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"created_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"modified_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"modified_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"molecule_created_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"molecule_created_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"page_size": "The maximum # of objects to return.",
"projects": "Comma-separated list of project ids. Defaults to all available projects. Limits scope of query.",
"data_sets": "Comma-separated list of public data set ids. Defaults to no data sets. Limits scope of query.",
"molecule_batch_identifier": "A Molecule-Batch ID used to query the Vault. Use this parameter to limit the number of Molecule UDF Fields to return",
"molecule_fields": "Array of Molecule field names to include in the resulting JSON. Use this parameter to limit the number of Molecule UDF Fields to return.",
"batch_fields": "Array of Batch field names to include in the resulting JSON. Use this parameter to limit the number of Batch UDF Fields to return.",
"fields_search": "Array of Batch field names & values. Used to filter Batches returned based on query values"
Returns: pandas.DataFrame
or list
Create a new batch in CDD Vault.
postBatches(data=None, help=False)
- data: Required, unless 'help' is set to True. Must be either a valid json object, or a string file path to a valid json file. Allowed JSON Examples
Update an existing batch in CDD Vault.
putBatches(self, id=None, data=None, help=False)
# id (int or str): unique id for an existing batch object in CDD Vault. Required, unless 'help' is set to True.
-
data: Required, unless 'help' is set to True. Must be either a valid json object, or a string file path to a valid json file. Allowed JSON Examples
Note: putBatches() method call should not be used to update the chemical structure of the parent Molecule. Instead, use the putMolecules() method to achieve this.
Molecules
Return a list of molecules and their batches, based on optional parameters.
getMolecules(self, asDataFrame=True, help=False, **kwargs)
- asDataFrame
bool
returns the json as a Pandas DataFrame.
Additional Valid Arguments:
"molecules": "Comma-separated list of ids (not molecule names!). Cannot be used with other parameters",
"names": "Comma-separated list of names/synonyms.",
"async": "Boolean. If true, do an asynchronous export (see Async Export). Use for large data sets. Note - always set to True when using Python API",
"no_structures": "Boolean. If true, omit structure representations for a smaller and faster response. Default: false",
"only_ids": "Boolean. If true, only the Molecule IDs are returned, allowing for a smaller and faster response. Default: false",
"created_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"created_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"modified_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"modified_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"batch_created_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"batch_created_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"batch_field_before_name": "Batch field name",
"batch_field_before_date": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"batch_field_after_name": "Batch field name",
"batch_field_after_date": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"page_size": "The maximum # of objects to return.",
"projects": "Comma-separated list of project ids. Defaults to all available projects. Limits scope of query.",
"data_sets": "Comma-separated list of public data set ids. Defaults to no data sets. Limits scope of query.",
"structure": "SMILES, cxsmiles or mol string for the query structure. Returns Molecules from the Vault that match the structure-based query submitted via this API call.",
"structure_search_type": "Available options are: 'exact', 'similarity' or 'substructure'. Default option is substructure.",
"structure_similarity_threshold": "A number between 0 and 1. Include this parameter only if the structure_search_type is 'similarity'.",
"inchikey": "A valid InchiKey. Use this parameter in place of the 'structure' and 'structure_search_type' parameters.",
"molecule_fields": "Array of Molecule field names to include in the resulting JSON. Use this parameter to limit the number of Molecule UDF Fields to return.",
"batch_fields": "Array of Batch field names to include in the resulting JSON. Use this parameter to limit the number of Batch UDF Fields to return.",
"fields_search": "Array of Molecule field names & values. Used to filter Molecules returned based on query values"
Returns: pandas.DataFrame
or list
Register a new molecule in CDD Vault.
postMolecules(data=None, help=False)
- data: Required, unless 'help' is set to True. Must be either a valid json object, or a string file path to a valid json file. Allowed JSON
Update an existing molecule in CDD Vault.
putMolecules(id=None, data=None, help=False)
-
id
int
orstr
unique id for an existing molecule object in CDD Vault. Required, unless 'help' is set to True. -
data: Required, unless 'help' is set to True. Must be either a valid json object, or a string file path to a valid json file. Allowed JSON
Public Data-Sets
Return a list of accessible public data sets for the given vault.
getDatasets(asDataFrame=True)
Defaults to 1000, the maximum # of objects which a CDD GET request can return synchronously.
Returns: pandas.DataFrame
or list
ELN Entries
Note: For security purposes, the GET and POST ELN Entries CDD Vault API commands documented here are only available for Vault Administrators.
Return information on the ELN entries for the specified vault
getELNEntries(summary=True, asDataFrame=True, exportPath=None, unzipELNEntries=False, help=False, **kwargs)
-
summary
bool
: if true, returns summary data for the requested ELN entries. This is equivalent to the synchronous call. -
asDataFrame
bool
: returns the summary as a Pandas DataFrame. Only relevant ifsummary=True
. -
exportPath
str
: file path for extracting zipped ELN entries to. Only relevant ifsummary=False
. -
unzipELNEntries
bool
: if true, extracts the zip contents of exportPath to a directory named\exportPath\
Returns: pandas.DataFrame
or list
Create a new ELN entry.
postELNEntries(project, title=None, eln_fields={})
-
project
str
the project ID or name where the new ELN entry will be created. -
title
str
the title of the new ELN entry. -
eln_fields
dict
a set of configured ELN field/value pairs which have been set by a Vault Administrator for the specified vault.
Fields
Return a list of available fields for the given vault.
getFields(asDataFrame=True)
This API call will provide you with the “type” and “name” values of *all* fields within a Vault.
The json keys returned by this API call are organized into the following: internal, batch, molecule, protocol
- asDataFrame
bool
returns the json as a Pandas DataFrame.
Returns: dict
of pandas.DataFrame
or list
Files
Retrieve a single file object from CDD Vault using its file ID.
getFile(fileID, destFolder=None)
- destFolder
str
destination folder where file contents should be written to. File name will default to the original name of the file when it was uploaded to CDD Vault.
Returns: str
of decoded response, also writes to file system.
Attach a file to an object (Run, Molecule, Protocol or ELN entry).
postFiles(objectType, objectID, fileName)
-
objectType
str
specifies the CDD object type to which the file will be attached. Value must be one of molecule, protocol, run, or eln_entry. -
objectID
str
an existing uid for a run, molecule, protocol, or ELN entry object. -
fileName
str
valid file path for upload to CDD.
Delete a single file attached to an object (Run, Molecule, Protocol or ELN entry) using its unique file ID.
deleteFiles(fileID)
- fileID
str
unique ID for an existing file in CDD vault.
Mapping Templates
Return summary information on all available mapping templates in the Vault specified. Alternatively, if 'id' argument is set, will retrieve details on the data objects mapped within a specific mapping template.
getMappingTemplates(id=None, asDataFrame=True)
Additional fields when id argument is set include:
A 'header_mappings' section that identifies the field/readout each header is mapped to.
A 'file' section that provides details on the original file used to create the template.
- asDataFrame
bool
returns the json as a Pandas DataFrame. This parameter is ignored if an id value has been set.
Returns: JSON dict
or pandas.DataFrame
Plates
Return a collection of plates from CDD vault.
getPlates(asDataFrame=True, help=False, **kwargs)
- asDataFrame
bool
returns the json as a Pandas DataFrame. This parameter is ignored if an id value has been set.
Additional Valid Arguments:
"plates": "Comma-separated list of ids.",
"names": "Comma-delimited list of plate names.",
"locations": "Comma-delimited list of plate locations.",
"async": "Boolean. If true, do an asynchronous export (see Async Export). Use for large data sets. Note - always set to True when using Python API",
"page_size": "The maximum # of objects to return.",
"projects": "Comma-separated list of project ids.Defaults to all available projects.Limits scope of query."
Returns: JSON dict
or pandas.DataFrame
Delete a single existing plate in CDD Vault using its plate ID.
deletePlates(id)
- id
str
orint
Unique ID for an existing plate in CDD vault.
Plot
Download dose-response curves/plots for a single Batch.
getPlot(batchID, protocolID, size="small", destFolder=None)
This API call generates a png image file containing the dose-response plot for the specific Batch within the specified Protocol
-
batchID
str
id for the desired batch. -
protocolID
str
id for the desired protocol -
size
str
relative size of the response png file. Valid options are small, medium and large -
destFolder
str
destination folder where file contents should be written to. File name will default to the original name of the file when it was uploaded to CDD Vault.
Returns: str
of decoded response, also writes to file system.
Projects
Return a list of accessible projects for the given vault.
getProjects(asDataFrame=True)
- asDataFrame
bool
returns the json as a Pandas DataFrame.
Returns: JSON dict
or pandas.DataFrame
Protocols
Return a list of accessible projects for the given vault.
getProtocols(asDataFrame=True, help=False, **kwargs)
- asDataFrame
bool
returns the json as a Pandas DataFrame.
Additional Valid Arguments:
"protocols": "Comma-separated list of protocol ids. Cannot be used with other parameters",
"names": "Comma-separated list of protocol names. Cannot be used with other parameters.",
"only_ids": "Boolean. If true, only the Protocol IDs are returned,\n"
"allowing for a smaller and faster response. Default: false",
"created_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"created_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"modified_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"modified_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"runs_modified_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"runs_modified_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm)",
"plates": "Comma-separated list of plate ids.",
"molecules": "Comma-separated list of molecule ids.",
"page_size": "The maximum # of objects to return.",
"projects": "Comma-separated list of project ids.\n"
"Defaults to all available projects.\n"
"Limits scope of query.",
"data_sets": "Comma-separated list of public data set ids.\n"
"Defaults to no data sets. Limits scope of query.",
"slurp": "Specify the slurp_id of an import operation.\n"
"Once an import has been committed, you can return\n"
"additional JSON results that will expose the Protocol\n"
"and Run(s) of data that were imported."
Returns: JSON dict
or pandas.DataFrame
Protocol Data
Return a filtered subset of the readout data for a single protocol using its protocol ID.
getProtocolData(id=None, asDataFrame=True, help=False, statusUpdates=True, **kwargs)
'id' argument is required, unless 'help' is set to True.
-
id
str
orint
ID for the desired protocol. -
asDataFrame
bool
Returns the json as a Pandas DataFrame. -
statusUpdates
bool
Display status updates for the asynchronous export.
Additional Valid Arguments:
"plates": "Comma-separated list of plate ids. Include only data for the specified plates.",
"molecules": "Comma-separated list of molecule ids. Include only data for the specified molecules.",
"runs_before": "Date (YYYY-MM-DDThh:mm:ss±hh:mm). Include only data for runs on or before the date",
"runs_after": "Date (YYYY-MM-DDThh:mm:ss±hh:mm). Include only data for runs on or after the date.",
"runs": "Comma-separated list of run ids for the given protocol. Include only data for runs listed.",
"page_size": "The maximum # of objects to return.",
"projects": "Comma-separated list of project ids. Defaults to all available projects. Limits scope of query.",
"format": "'csv'
Generates a csv file which mimics the file generated when you choose the 'Export readouts' button
from the Run-level 'Run Details' tab within the CDD Vault web interface.
When used as a keyword argument, this forces an asynchronous GET request.
All other keyword arguments will be ignored, EXCEPT for the 'runs' keyword."
Returns: JSON dict
or pandas.DataFrame
. Optionally writes .csv to file system.
Readout Rows
Update an existing readout row (including the ability to flag an existing readout row as an outlier).
putReadoutRows(id=None, data=None, help=False)
Allows a user to update a specified row of Protocol data, set its value to null, or flag a specified row of Protocol data as an outlier.
Use getProtocolData() method with runs specified to ascertain the id of the readout row for the Protocol data you wish to edit.
Use getProtocols() method to ascertain the readout definition IDs.
-
id
str
orint
unique id for an existing readout row object in CDD Vault. Required, unless 'help' is set to True. -
data: Required, unless 'help' is set to True. Must be either a valid json object, or a string file path to a valid json file. Allowed JSON Examples
Delete a single readout row associated with protocol data in CDD Vault using its unique ID.
deleteReadoutRows(id)
- id
str
orint
unique id for an existing readout row object in CDD Vault.
Runs
Retrieve a single run using its unique run ID.
getRun(runID)
- id
str
orint
unique id for an existing readout row object in CDD Vault.
Update an existing run using its unique run ID.
putRuns(id=None, data=None, help=False)
-
id
str
orint
unique id for an existing run object in CDD Vault. -
data: Required, unless 'help' is set to True. Must be either a valid json object, or a string file path to a valid json file. Allowed JSON Examples
Fields not specified in the JSON are not changed. Allows users to update the run's Project association, as well as the Run_Date, Person, Place, and Conditions fields. Required, unless 'help' is set to True.
Delete one or more runs from CDD Vault
deleteRuns(id, slurps=False)
-
slurps
bool
If True, the id parameter will be interpreted as a slurps ID. Specifies the slurp_id of an import operation. The user must have appropriate permissions to remove ALL runs in the slurp.All runs associated with the slurps ID will be deleted. If user permissions are insufficient, no runs will be deleted.
-
id
str
orint
unique id for an existing readout row object in CDD Vault.
Slurps
Bulk import endpoint for programmatic use. CDD Support Topic
postSlurpsData(fileName, project, mappingTemplate=None, runs=None, interval=5.0)
Uses an existing mapping template to map the data in the import file into CDD Vault.
Once a file has been uploaded through the API, data from the import is committed immediately unless there are errors or warnings.
Any import errors or warnings (Suspicious Events) will cause the import to be REJECTED.
-
project
str
orint
Required. Either the name or id of a single project. To use a project name, enter astr
. To use a project id, enter anint
. -
mapping_template
str
orint
The name (str
) or id (int
) of a mapping template that matches the attached file. If you choose to exclude this keyword:CDD will attempt to use an existing template that matches the import file. If none of the templates in your vault match, the import will be REJECTED If more than one of the templates in your vault match, the import will be REJECTED
-
runs
dict
Optional. a single run detail object which will be applied to all new runs present in the file. Valid Keys:
"run_date": use YYYY-MM-DDThh:mm:ss:hh:mm. Default is today’s date.
"place": This is the 'lab' condition in CDD. No default.
"person": default value is user's full name.
"conditions": no default value provided.
Batch Move Jobs
This endpoint requires the user to be a vault admin
Retrieve the statuses of one or more batch move jobs from CDD Vault queue.
getBatchMoveJobs(self, batchMoveJobID=None)
- batchMoveJobID
int
orstr
Optional. The unique ID of the batch move job to retrieve. IfNone
, retrieves all jobs in the queue.
Create a new batch move job to move a batch to a different molecule in the same vault.
postBatchMoveJob(self, data=None)
- data
JSON
Required. Valid Keys:
"batch": Unique integer ID of the batch to move. Required.
"molecule": Unique integer ID of the molecule to move the batch to. Required.
"name": A new name for the batch. Optional. Only allowed
for vaults without a registration system.
"fail_on_molecule_deletion": Fail if moving the batch would trigger the removal
of the originating molecule. Default true.
Cancel a single batch move job in the queue
deleteBatchMoveJob(self, batchMoveJobID)
- batchMoveJobID
int
orstr
Required. The unique ID of the batch move job to retrieve.
NOTE: Once a job has started it cannot be deleted. Also, if you are moving the highest batch of a molecule, the batch number it previously occupied will be reused by the next batch of the original molecule.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cdd_python_sdk-1.0.3.tar.gz
.
File metadata
- Download URL: cdd_python_sdk-1.0.3.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cdae9e7e63c122b438aa7a795a9242e6427cbb8e75a58b91bde3a4e228b936d |
|
MD5 | 390197478a021e7adb0da18fe4f67c2f |
|
BLAKE2b-256 | ccd5bb6cd870373acc072226475dc838dc7fab93166ae05823b5b6fead9852ad |
File details
Details for the file cdd_python_sdk-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: cdd_python_sdk-1.0.3-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe6c922464de41c236c0d273c9572aa98e4c8198cb5596d2b63598edea0e337c |
|
MD5 | e700be22a08630126b871f02b662226f |
|
BLAKE2b-256 | 0ec5bf087b6eec1bb2de2a6fc889d88bd44f9eb871b271ab68efaae946135c0d |