Skip to main content

A translator of Broad and JUMP ids to more conventional names.

Project description

Proposal for broad-babel Module

It aims to translate identifiers from the Broad Institute or JUMP consortium into more standardised versions. In the case of genes NCBI gene names and in the case of chemical compounds InChiKey. From there the user can get more biological context and explore different translations.

Making this a python modules facilitates its integration into existing workflows and facilitates updating the (small) database. The python code contains exclusively the querying logic, and the csv files have been trimmed as much as possible to focus on the important data.

Input and output design

Inputs

  • One or multiple (str) identifiers of a type (either the JUMP or Broad ID).
  • A (str) specifying the type of identifier to query.

Outputs

  • A dictionary where key-value pairs are input_id -> output id.

Data sources

Considerations

Advantages

  • Broad-babel would make plotting data with "well-known" gene names seamless. Just translate your broad ids and plot away. It also provides access to the CSV compendium with all the name-based metadata of the JUMP data collective.
  • It would also make it easier for other biologists or data scientists to approach the JUMP dataset, as Broad/JUMP ids mean nothing to them.

Limitations

  • Data and metadata may be currently incomplete, but it would be useful to find the holes in JUMP metadata. This would make it easier to be aware of what is missing, as current data
  • Querying in python is probably inefficient, but as long as the size of the dataset is small this should not be prohibitive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

broad_babel-0.1.2.tar.gz (4.0 MB view hashes)

Uploaded Source

Built Distribution

broad_babel-0.1.2-py3-none-any.whl (4.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page