Skip to main content

No project description provided

Project description

ChemCurry logo

ChemCurry

chemcurry is a chemical curation workflow package meant to both streamline building curation workflows and producing detailed reports about which chemicals where flagged and when while doing so in a manner to enforce reproducibility and easy sharing. The Molecular Modeling Lab @ UNC often finds itself needing to generate these reports to show to our PI and share our workflows with new members, so this package was developed as a way to standardize that process.

While most chemical curation workflows for any project can be built in under 100 lines of code, the core idea behind chemcurry is to assert reproducibility and easy building/sharing among chemist with any level of coding background. Most cheminformatics projects and publications will need to do some type of curation, and, frankly, the methods on how this is done is often not up to par with scientific reproducibility standards. We believe that lack of reproducibility hurts our filed and chemcurry aims to fix that (for at least on part of it).

Closely related to the philosophy of reproducibility, chemcurry was also designed to be easy to add new curation functions too. There is a simple API that really only requires you to write the same code you might if you were doing in manually in a notebook or script.

What about curation with labels or non-chemical properties?

chemcurry is designed to operate on explict chemical properties, meaning if the property cannot be calculated using just the chemical, it will not fit into the workflow. If you find yourself needing a curation workflow that can use external properties (say to curated a data set with IC50 values for a machine learning/QSAR model) look into chemcurry-learn which extends chemcurry to support this.

Installing

You can install the chemcurry package using pip:

pip install chemcurry

This package was built using poetry, so you can also install it by cloning the repository and create a poetry environment (though this is not recommend outside of development).

git clone https://github.com/jimmyjbling/ChemCurry
cd chemcurry
poetry install

Building and running a workflow

Building a chemical curation workflow with chemcurry requires only a few lines of code

smiles = ["CCCC", "CCCO", "CCCCN"]

from chemcurry.workflow import CurationWorkflow
from chemcurry.steps import AddH, Add3D, FilterMW, RemoveStereochem

steps = [
    AddH(),
    Add3D(timeout=30),
    FilterMW(max_mw=100, min_mw=10),
    RemoveStereochem()
]

my_workflow = CurationWorkflow(steps=steps)
curated_chemicals = my_workflow.curate_smiles(smiles)

The result of the workflow run, a CuratedChemicalSet contains all the info about which compounds failed curation, which compound were altered and why/how all of it happened. You can save save that info in a human readable report by simply running

curated_chemicals.write_report("path/to/my/report.txt")

You can also extract the curated smiles, either as canonical smiles or rdkit Mols

curated_mols = curated_chemicals.to_mols()
curated_smiles = curated_chemicals.to_smiles()

History tracking

You can optionally turn on history tracking mode if you want extremely detailed information about the evolution of chemical as they progress through curation. This comes at the expense of extra memory. All you need to do is set history_tracking=True when initializing your workflow. This will save copies of the molecules after each update is made to them so you can render the full history of the molecule. This can be done by looping through the Molecule objects attached to the curation output in the molecules attribute.

Note: Right now there is not alot you can do with history. In the future, extra features like viewing the history of the molecule as an image might be added.

Saving, loading and sharing workflows

After making and using a workflow, there is a good chance you will want to save it, either so you can using it again later without having to redefine it, or so you can share it as part of a publication or project. You can do this by creating a workflow file (see here for more info on these files)

All you need to do is

my_workflow.save_workflow_file("path/to/my/workflow.json")

To load in an existing one you can use

my_workflow = CurationWorkflow.load("path/to/my/workflow.json")

Simplate as that. There are some checks and other things happening under the hood to help prioritize reproducibility and prevent unexpected behavior. You can read more about how all that work here

Creating a custom curation steps

The curation functions that already exist in chemcurry are unlikely to always have everything you need. chemcurry defines very simple APIs that allow you to easily write your own curation steps You can read more about how that work here

If you do make your own, we humbly request you submit them to chemcurry so that the community can benefit from them. Simply make a fork, push your new function (and its unit test) and then make a pull request. You can read more about contributing to chemcurry [here]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemcurry-0.1.2.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemcurry-0.1.2-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file chemcurry-0.1.2.tar.gz.

File metadata

  • Download URL: chemcurry-0.1.2.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.9 Windows/10

File hashes

Hashes for chemcurry-0.1.2.tar.gz
Algorithm Hash digest
SHA256 161ed55a0feba12225cf0a5ac8b7a845f8c9a9d2d3a20019c44a46ce91562f61
MD5 2cbafe932c937bd573245b0dfad9f5f8
BLAKE2b-256 2db157c629a8039b504e187dc57abae17b739562c7255ac7098b9143cb35ac43

See more details on using hashes here.

File details

Details for the file chemcurry-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: chemcurry-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.9 Windows/10

File hashes

Hashes for chemcurry-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bea9a25b6038b590e8da22e794d9bbb36fa13571c7fd3b6b673e87b85ac054ff
MD5 3deda493db605bda6dc466b8147a74e6
BLAKE2b-256 98183888334ccdcc56e51abb2186f331feac52229eda97dc19fa7139dc717c05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page