A translator of Broad and JUMP ids to more conventional names.
Project description
Proposal for broad-babel Module
It aims to translate identifiers from the Broad Institute or JUMP consortium into more standardised versions. In the case of genes NCBI gene names and in the case of chemical compounds InChiKey. From there the user can get more biological context and explore different translations.
Making this a python modules facilitates its integration into existing workflows and facilitates updating the (small) database. The python code contains exclusively the querying logic, and the csv files have been trimmed as much as possible to focus on the important data.
Input and output design
Inputs
- One or multiple (str) identifiers of a type (either the JUMP or Broad ID).
- A (str) specifying the type of identifier to query.
Outputs
- A dictionary where key-value pairs are input_id -> output id.
Data sources
- JUMP perturbation lists: https://github.com/jump-cellpainting/jump-perturbation-lists/tree/main
- JUMP CP Metadata: https://github.com/jump-cellpainting/datasets/tree/main/metadata
- JUMP-target metadata: https://github.com/jump-cellpainting/JUMP-Target/blob/master/JUMP-Target-1_compound_metadata.tsv
Considerations
Advantages
- Broad-babel would make plotting data with "well-known" gene names seamless. Just translate your broad ids and plot away. It also provides access to the CSV compendium with all the name-based metadata of the JUMP data collective.
- It would also make it easier for other biologists or data scientists to approach the JUMP dataset, as Broad/JUMP ids mean nothing to them.
Limitations
- Data and metadata may be currently incomplete, but it would be useful to find the holes in JUMP metadata. This would make it easier to be aware of what is missing, as current data
- Querying in python is probably inefficient, but as long as the size of the dataset is small this should not be prohibitive.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
broad_babel-0.1.2.tar.gz
(4.0 MB
view hashes)
Built Distribution
Close
Hashes for broad_babel-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2ccf779a16589006e8d0e6fd3f064863f69293ae665bb2e44d57e392a6a8876 |
|
MD5 | c07c61aa559c4092b6ceb4ad40f1e684 |
|
BLAKE2b-256 | 40dbe18cfbf66b8f7f42affcaf0568df06c8290cffc17a443ab9ca1c3d5282a2 |