A translator of Broad and JUMP ids to more conventional names.
Project description
Broad_Babel
Minimal name translator of JUMP consortium.
Installation
pip install broad-babel
Broad sample to standard
You can fetch a single value
from broad_babel.query import broad_to_standard
broad_to_standard("BRD-K18895904-001-16-1")
# -> 'KVWDHTXUZHCGIO-UHFFFAOYSA-N'
If you provide multiple strings it will return dictionary.
broad_to_standard(("BRD-K36461289-001-05-8", "ccsbBroad304_16164"))
# {'BRD-K36461289-001-05-8': 'SCIMP', 'ccsbBroad304_16164': 'PIMZUZSSNYHVCU-KBLUICEQSA-N'}
Wildcard search
You can also use sqlite bindings. For instance, to get all the samples that start as "poscon" you can use:
run_query(query="poscon%", input_column="pert_type", output_columns="JCP2022,standard_key,plate_type,pert_type", operator="LIKE")
# [(None, 'LRRMQNGSYOUANY-OMCISZLKSA-N', 'compound', 'poscon_cp'),
# (None, 'DHMTURDWPRKSOA-RUZDIDTESA-N', 'compound', 'poscon_diverse'),
# ...
# ('JCP2022_913605', 'CDK2', 'orf', 'poscon_orf'),
# ('JCP2022_913622', 'CLK1', 'orf', 'poscon_cp')]
Make mappers for quick renaming
This is very useful when you need to map from a long list of perturbation names. The following example shows how to map all the perturbations in the compound plate from JCP id to perturbation type.
from broad_babel.query import get_mapper
mapper = get_mapper(query="compound", input_column="plate_type", output_columns="JCP2022,pert_type")
Export database as csv
from broad_babel.query import export_csv
export_csv("./output.csv")
Custom querying
The available fields are:
- standard_key: Gene Entrez id for gene-related perturbations, and InChIKey for compound perturbations
- JCP2022: Identifier from the JUMP dataset
- plate_type: Dataset of origin for a given entry
- NCBI_Gene_ID: NCBI identifier, only applicable to ORF and CRISPR
- broad_sample: Internal Broad ID
- pert_type: Type of perturbation, options are trt (treatment), control, negcon (Negative Control), poscon_cp (Positive Control, Compound Probe), poscon_diverse, poscon_orf, and poscon (Positive Control).
You can fetch any field using another (note that the output is a list of tuples)
run_query(query="JCP2022_915119", input_column="JCP2022", output_columns="broad_sample")
# [('ccsbBroad304_16164',)]
Note that there are some duplicates that arise from both between orf and crispr perturbations, but also within orf standard_keys.
run_query("ccsbBroad304_00900", input_column = "broad_sample", output_columns = "*")
# [('crispr', 'JCP2022_803621', 'KCNN1', 'ccsbBroad304_00900', 'trt', None),
# ('orf', 'JCP2022_900842', 'KCNN1', 'ccsbBroad304_00900', 'trt', None),
# ('Target1_orf', None, 'KCNN1', 'ccsbBroad304_00900', 'trt', None)]
It is also possible to use fuzzy querying by changing the operator argument and adding "%" to out key.
run_query(
"BRD-K21728777%",
input_column="broad_sample",
output_columns="*",
operator="LIKE",
)
# [('compound',
# 'JCP2022_037716',
# 'IVUGFMLRJOCGAS-UHFFFAOYSA-N',
# 'BRD-K21728777-001-02-3',
# 'control',
# 'poscon_cp'),
# ('Target2_compound',
# None,
# 'IVUGFMLRJOCGAS-UHFFFAOYSA-N',
# 'BRD-K21728777-001-02-3',
# 'control',
# 'poscon_cp')]
Additional documentation
Metadata sources and additional documentation is available here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for broad_babel-0.1.21-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11fdea120375a5b273b80f614aa23269669166becad687e24e6f5bc607a962c0 |
|
MD5 | 8f6af302c1a0db120c952082e9f2a89c |
|
BLAKE2b-256 | 1773303301975f34586c93615a1ebae1ef69733e5ffc765b0d71b3c2e6cbfedc |