No project description provided

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Natural Language
- English
Programming Language

Project description

Global-Chem: Collections of common small molecules and their SMILES/SMARTS to support diverse chemical communities

Python Repo Size

Global Chem is an open-source graph database and api for common and rare chemical lists using IUPAC as input and SMILES/SMARTS as output. As mostly needed by myself as I search through chemical infinity.

I have found these lists written in history to be useful, they come from a variety of different fields but are aggregated into the most common format of organic chemists (IUPAC) and the common language of the cheminformatician (SMILES) and for pattern matching (SMARTS).

Installation

GlobalChem is going to be distribute via PyPi and as the content store grows we can expand it to other pieces of software making it accessible to all regardless of what you use. Alternatively, you could have a glance at the source code and copy/paste it yourself.


pip install global-chem

Rules

The Graph Network (GN)s comes with a couple of rules that for now make the software engineering easier on the developer.

1.) There must be a root node. 2.) When Adding a Node every node must be connected. 3.) To remove a node it must not have any children.

The Deep Graph Network (DGN)s comes also with a couple of rules to make the implementation easier:

1.) There must be a root node of 1 which marks as your "input" node. 2.) When adding a layer all nodes will be added to all the previous layers as children. (Folk can use the remove node feature to perform dropouts)

Quick Start

Just with no dependencies, intialize the class and there you go! All the common and rare groups of the world at your disposal

Print the GlobalChem Structure

gc = GlobalChem()
gc.print_globalchem_network()

>>>

                                ┌solvents─common_organic_solvents
             ┌organic_synthesis─└protecting_groups─amino_acid_protecting_groups
             │          ┌polymers─common_monomer_repeating_units
             ├materials─└clay─montmorillonite_adsorption
             │                            ┌privileged_kinase_inhibtors
             │                            ├privileged_scaffolds
             ├proteins─kinases─┌scaffolds─├iupac_blue_book_substituents
             │                 │          └common_r_group_replacements
             │                 └braf─inhibitors
             │              ┌vitamins
             │              ├open_smiles
             ├miscellaneous─├amino_acids
             │              └regex_patterns
global_chem──├environment─emerging_perfluoroalkyls
             │          ┌schedule_one
             │          ├schedule_four
             │          ├schedule_five
             ├narcotics─├pihkal
             │          ├schedule_two
             │          └schedule_three
             ├interstellar_space
             │                    ┌cannabinoids
             │                    │         ┌electrophillic_warheads_for_kinases
             │                    ├warheads─└common_warheads_covalent_inhibitors
             └medicinal_chemistry─│      ┌phase_2_hetereocyclic_rings
                                  └rings─├iupac_blue_book_rings
                                         └rings_in_drugs

To Access Nodes and Visualize the Internal Network:

from global_chem import GlobalChem

gc = GlobalChem()

nodes_list = gc.check_available_nodes()
print (nodes_list)

>>>
'emerging_perfluoro_alkyls', 'montmorillonite_adsorption', 'common_monomer_repeating_units', 'electrophilic_warheads_for_kinases',

gc.build_global_chem_network(print_output=True)

>>>
'global_chem': {
    'children': [
        'environment',
        'miscellaneous',
        'organic_synthesis',
        'medicinal_chemistry',
        'narcotics',
        'interstellar_space',
        'proteins',
        'materials'
    ],
    'name': 'global_chem',
    'node_value': <global_chem.global_chem.Node object at 0x10f60eed0>,
    'parents': []
},

The algorithm uses a series of parents/children to connect nodes instead of "edges" as in traditional graph networks. This just makes it easier to code if the graph database lives as a 1-dimensional with lists of parents and childrens connected in this fashion.

Fetch the Node:

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
node = gc.get_node('emerging_perfluoroalkyls').get_smiles()
print (node)

>>>
{'perfluorohexanoic acid': 'C(=O)(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)O' etc...}

Fetch the IUPAC:SMILES/SMARTS Data from the Node:

gc = GlobalChem()
gc.build_global_chem_network(print_output=True, debugger=False)
smiles = gc.get_node_smiles('emerging_perfluoroalkyls')
smarts = gc.get_node_smarts('emerging_perfluoroalkyls')

print (smiles)

Fetch All Data from Network:

gc = GlobalChem()
print(gc.get_all_smiles())
print(gc.get_all_smarts())
print(gc.get_all_names())

>>>
['C(=O)(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)O', 'C(=O)(C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F)O' etc...]

Remove a Node from the Network:

Removes the Node and it's connections to any parents.

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
gc.remove_node('emerging_perfluoroalkyls')

Set & Get the Node Value:

If the user wants to put some metadata inside the node they can:

gc = GlobalChem()
gc.build_global_chem_network(print_output=True, debugger=False)
gc.set_node_value('emerging_perfluoroalkyls', {'some_data': ['bunny']})
print (gc.get_node_value('emerging_perfluoroalkyls'))

>>>
{'some_data': ['bunny']}

To Create Your Own Chemical Graph Network (GN) And Check the Values

from global_chem import GlobalChem

gc = GlobalChem(verbose=False)
gc.initiate_network()
gc.add_node('global_chem', 'common_monomer_repeating_units')
gc.add_node('common_monomer_repeating_units','electrophilic_warheads_for_kinases')
values = gc.get_node_smiles('common_monomer_repeating_units')

print (values)

>>>
'3′-bromo-2-chloro[1,1′:4′,1′′-terphenyl]-4,4′′': 'ClC1=CC=CC=C1C2=CC=C(C3=CC=CC=C3)C(Br)=C2'

values = gc.get_node_smarts('electrophilic_warheads_for_kinases')

>>>
'propane-1,3-diyl': '[#6]-[#6]-[#6]', 'methylmethylene': '[#6H]-[#6]',

Creating Deep Layer Chemical Graph Networks (DGN) & Print it out:

This is for more advanced users of graph theory and understanding.

gc = GlobalChem()
gc.initiate_deep_layer_network()
gc.add_deep_layer(
    [
        'emerging_perfluoroalkyls',
        'montmorillonite_adsorption',
        'common_monomer_repeating_units'
    ]
)
gc.add_deep_layer(
    [
        'common_warhead_covalent_inhibitors',
        'privileged_scaffolds',
        'iupac_blue_book'
    ]
)

gc.print_deep_network()


>>>
                                      ┌common_warhead_covalent_inhibitors
            ┌emerging_perfluoroalkyls─├privileged_scaffolds
            │                         └iupac_blue_book
            │                           ┌common_warhead_covalent_inhibitors
global_chem─├montmorillonite_adsorption─├privileged_scaffolds
            │                           └iupac_blue_book
            │                               ┌common_warhead_covalent_inhibitors
            └common_monomer_repeating_units─├privileged_scaffolds
                                            └iupac_blue_book

Compute Common Score for an IUPAC Name:

Based on how many times a word is mentioned per object increases the common weight. The more weight the more common. A score of 0 indicates it is "uncommon".


Common Score Algorithm:

    1.) Data mine the current state of GlobalChem
    2.) Get the Object Weights of Each mention
    3.) Determine the Mention Weight
    4.) Sum the Weights and That's How common it is.

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
gc.compute_common_score('benzene', verbose=True)

Adding Your Own Chemical List

If you would like to add your paper to the chemical graph network then please "File an Issue" with your chemical list and perhaps a suggestion of where to add it or you can leave for up to us to decide. The format of the chemical list can be something like this:


smiles = {
   '3,5-dimethoxyphenylisoproxycarbonyl': 'COC1=CC(C(C)(OC=O)C)=CC(OC)=C1',
   '2-(4-biphenyl)isopropoxycarbonyl': 'CC(C)(OC=O)C(C=C1)=CC=C1C2=CC=CC=C2',
   '2-nitrophenylsulfenyl': 'SC1=CC=CC=C1[N+]([O-])=O',
   'boc': 'O=COC(C)(C)C',
}

GlobalChem Extensions

Applications of GlobalChem can be applied to a variety of cheminformatic usage. One of which is functional group analysos of any SMILES dataset using the SMARTS patterns strings described in the data. GlobalChemExtensions have

Sunbursting

Please navigate here for more documentation: https://github.com/Sulstice/global-chem-extensions

from global_chem_extensions.global_chem_extensions import GlobalChemExtensions

test_set = [
    'c1[n+](cc2n(c1OCCc1cc(c(cc1)F)F)c(nn2)c1ccc(cc1)OC(F)F)[O-]',
    'c1nc(c2n(c1OCCc1cc(c(cc1)F)F)c(nn2)c1ccc(cc1)OC(F)F)Cl',
    'c1ncc2n(c1CCO)c(nn2)c1ccc(cc1)OC(F)F',
    'C1NCc2n(C1CCO)c(nn2)c1ccc(cc1)OC(F)F',
    'C1(CN(C1)c1cc(c(cc1)F)F)Oc1cncc2n1c(nn2)c1ccc(cc1)OC(F)F',
    'c1ncc2n(c1N1CCC(C1)c1ccccc1)c(nn2)c1ccc(cc1)OC(F)F',
]

GlobalChemExtensions().sunburst_chemical_list(test_set, save_file=False)

PCA Analysis

Conduct PCA Analysis with a SMILES list input.

from global_chem.global_chem import GlobalChem
from global_chem_extensions.global_chem_extensions import GlobalChemExtensions

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
smiles_list = list(gc.get_node_smiles('schedule_one').values())

GlobalChemExtensions().node_pca_analysis(smiles_list, save_file=False)

Variables List

Chemical List	# of Entries	References
Amino Acids	20	Common Knowledge
Essential Vitamins	13	Common Knowledge
Common Organic Solvents	42	Fulmer, Gregory R., et al. “NMR Chemical Shifts of Trace Impurities: Common Laboratory Solvents, Organics, and Gases in Deuterated Solvents Relevant to the Organometallic Chemist.”Organometallics, vol. 29, no. 9, May 2010, pp. 2176–79.
Open Smiles	94	OpenSMILES Home Page. http://opensmiles.org/.
IUPAC Blue Book (CRC Handbook) 2003	333	Chemical Rubber Company. CRC Handbook of Chemistry and Physics: A Ready-Reference Book of Chemical and Physical Data Edited by David R. Lide, 85. ed, CRC Press, 2004.
Rings in Drugs	92	Taylor, Richard D., et al. “Rings in Drugs.” Journal of Medicinal Chemistry, vol. 57, no. 14, July 2014, pp. 5845–59. ACS Publications, https://doi.org/10.1021/jm4017625.
Phase 2 Hetereocyclic Rings	19	Broughton, Howard B., and Ian A. Watson. “Selection of Heterocycles for Drug Design.” Journal of Molecular Graphics & Modelling, vol. 23, no. 1, Sept. 2004, pp. 51–58. PubMed, https://doi.org/10.1016/j.jmgm.2004.03.016.
Privileged Scaffolds	47	Welsch, Matthew E., et al. “Privileged Scaffolds for Library Design and Drug Discovery.” Current Opinion in Chemical Biology , vol. 14, no. 3, June 2010, pp. 347–61.PubMed, https://doi.org/10.1016/j.cbpa.2010.02.018.
Common Warheads	29	Gehringer, Matthias, and Stefan A. Laufer. “Emerging and Re-Emerging Warheads for Targeted Covalent Inhibitors: Applications in Medicinal Chemistry and Chemical Biology.”Journal of Medicinal Chemistry , vol. 62, no. 12, June 2019, pp. 5673–724. ACS Publications, https://doi.org/10.1021/acs.jmedchem.8b01153.
Common Polymer Repeating Units	78	Hiorns, R. C., et al. “A brief guide to polymer nomenclature (IUPAC Technical Report).”Pure and Applied Chemistry , vol. 84, no. 10, Oct. 2012, pp. 2167–69., https://doi.org/10.1351/PAC-REP-12-03-05.
Common R Group Replacements	499	Takeuchi, Kosuke, et al. “R-Group Replacement Database for Medicinal Chemistry.” Future Science OA , vol. 7, no. 8, Sept. 2021, p. FSO742. future-science.com (Atypon) , https://doi.org/10.2144/fsoa-2021-0062.
Electrophillic Warheads for Kinases	24	Petri, László, et al. “An Electrophilic Warhead Library for Mapping the Reactivity and Accessibility of Tractable Cysteines in Protein Kinases.” European Journal of Medicinal Chemistry, vol. 207, Dec. 2020, p. 112836. PubMed, https://doi.org/10.1016/j.ejmech.2020.112836.
Privileged Scaffolds for Kinases	29	Hu, Huabin, et al. “Systematic Comparison of Competitive and Allosteric Kinase Inhibitors Reveals Common Structural Characteristics.” European Journal of Medicinal Chemistry, vol. 214, Mar. 2021, p. 113206. ScienceDirect, https://doi.org/10.1016/j.ejmech.2021.113206.
BRaf Inhibitors	54	Agianian, Bogos, and Evripidis Gavathiotis. “Current Insights of BRAF Inhibitors in Cancer.” Journal of Medicinal Chemistry, vol. 61, no. 14, July 2018, pp. 5775–93. ACS Publications, https://doi.org/10.1021/acs.jmedchem.7b01306.
Common Amino Acid Protecting Groups	346	Isidro-Llobet, Albert, et al. “Amino Acid-Protecting Groups.” Chemical Reviews, vol. 109, no. 6, June 2009, pp. 2455–504. DOI.org (Crossref), https://doi.org/10.1021/cr800323s.
Emerging Perfluoroalkyls	27	Pelch, Katherine E., et al. “PFAS Health Effects Database: Protocol for a Systematic Evidence Map.” Environment International, vol. 130, Sept. 2019, p. 104851. ScienceDirect, https://doi.org/10.1016/j.envint.2019.05.045.
Chemicals For Clay Adsorption	33	Orr, Asuka A., et al. “Combining Experimental Isotherms, Minimalistic Simulations, and a Model to Understand and Predict Chemical Adsorption onto Montmorillonite Clays.” ACS Omega, vol. 6, no. 22, June 2021, pp. 14090–103. PubMed, https://doi.org/10.1021/acsomega.1c00481.
Cannabinoids	63	Turner, Carlton E., et al. “Constituents of Cannabis Sativa L. XVII. A Review of the Natural Constituents.” Journal of Natural Products, vol. 43, no. 2, Mar. 1980, pp. 169–234. ACS Publications, https://doi.org/10.1021/np50008a001.
Schedule 1 United States Narcotics	240	ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 2 United States Narcotics	60	ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 3 United States Narcotics	22	ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 4 United States Narcotics	77	ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 5 United States Narcotics	8	ECFR :: 21 CFR Part 1308 - Schedules.
Common Regex Patterns	1

GlobalChem, initially, is one class object with a series of Nodes that are act as objects for any common chemical lists. The chemical lists can be accessed as nodes and the user can construct their own node trees for the lists.

Also since these lists of commonality are stored on github it is easily searchable and tied directly to the paper for any bypasser.

>>>>>>> 713c3366fce5a5a3afb0b0c478f1f50048cb07c2

Genesis

GlobalChem was created because I noticed I was using the same variable across multiple scripts and figure it would be useful for folk to have.

Lead Developer Suliman sharif
Artwork Elena Chow

Citation

It's on it's way

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

1.10.0

Sep 30, 2024

1.9.4

May 9, 2024

1.9.3

May 9, 2024

1.9.2

May 9, 2024

1.9.1

May 9, 2024

1.8.5.3

Feb 22, 2024

1.8.5.2

Feb 22, 2024

1.8.5.1

Nov 27, 2023

1.8.5

Oct 16, 2023

1.8.4

Oct 16, 2023

1.8.3

Oct 14, 2023

1.8.1.2

Jun 27, 2023

1.8.1.1

Jun 27, 2023

1.8.1

Jun 27, 2023

1.8

Nov 19, 2022

1.7.6.0

Sep 21, 2022

1.7.5.0

Sep 21, 2022

1.7.4.0

Sep 3, 2022

1.7.3.2

Aug 27, 2022

1.7.3.1

Aug 27, 2022

1.7.2.1

Aug 9, 2022

1.7.2.0

Aug 7, 2022

1.7.1.6

Aug 2, 2022

1.7.1.5

Jul 27, 2022

1.7.1.4

Jul 27, 2022

1.7.1.3

Jul 27, 2022

1.7.1.2

Jul 27, 2022

1.7.1.1

Jul 27, 2022

1.7.1.0

Jul 21, 2022

1.7.0.8

Jul 21, 2022

1.7.0.6

Jul 21, 2022

1.7.0.5

Jul 21, 2022

1.7.0.4

Jul 21, 2022

1.7.0.3

Jul 21, 2022

1.7.0.2

Jul 21, 2022

1.7.0.1

Jul 15, 2022

1.6.3.1

Jul 6, 2022

1.6.3.0

Jul 6, 2022

1.6.2.9

Jul 4, 2022

1.6.2.8

Jul 4, 2022

1.6.2.7

Jun 16, 2022

1.6.2.6

Jun 16, 2022

1.6.2.5

Jun 16, 2022

1.6.2.4

Jun 9, 2022

1.6.2.3

Jun 4, 2022

1.6.2.2

May 30, 2022

1.6.2.1

May 30, 2022

1.6.2.0

May 30, 2022

1.6.1.8

May 30, 2022

1.6.1.7

May 26, 2022

1.6.1.6

May 26, 2022

1.6.1.5

May 8, 2022

1.6.1.4

May 8, 2022

1.6.1.3

May 3, 2022

1.6.1.2

May 2, 2022

1.6.1.1

May 2, 2022

1.6.1.0

May 2, 2022

1.6.0.9

Apr 29, 2022

1.6.0.8

Apr 21, 2022

1.6.0.5

Apr 20, 2022

1.6.0.2

Apr 20, 2022

1.6.0.1

Apr 20, 2022

1.5.9.9

Apr 20, 2022

1.5.9.8

Apr 20, 2022

1.5.9.6

Apr 11, 2022

1.5.9.5

Apr 8, 2022

1.5.9.4

Apr 8, 2022

1.5.9.3

Apr 8, 2022

1.5.9.2

Apr 8, 2022

1.5.9.1

Apr 8, 2022

1.5.9

Apr 8, 2022

1.5.8

Apr 2, 2022

1.5.7

Apr 2, 2022

1.5.6

Mar 24, 2022

1.5.5

Mar 24, 2022

1.5.4

Mar 22, 2022

1.5.2

Feb 23, 2022

1.5.1

Feb 21, 2022

1.5.0

Feb 20, 2022

1.4.9

Feb 20, 2022

1.4.8

Feb 20, 2022

1.4.6

Feb 20, 2022

1.4.5

Feb 19, 2022

1.4.4

Feb 18, 2022

1.4.3

Feb 18, 2022

1.4.2

Feb 18, 2022

1.4.1

Feb 18, 2022

This version

1.4.0

Feb 16, 2022

1.3.3

Feb 15, 2022

1.3.2

Feb 15, 2022

1.3.1

Feb 15, 2022

1.2.1

Feb 14, 2022

1.2.0

Feb 14, 2022

1.1.7

Feb 12, 2022

1.1.6

Feb 12, 2022

1.1.5

Feb 12, 2022

1.1.3

Feb 12, 2022

1.1.2

Feb 12, 2022

1.1.1

Feb 12, 2022

1.1.0

Feb 12, 2022

1.0.4

Feb 12, 2022

1.0.3

Feb 11, 2022

1.0.2

Dec 29, 2021

1.0.1

Dec 29, 2021

1.0.0

Dec 13, 2021

0.9.9.9.9.7

Dec 13, 2021

0.9.9.9.9.6

Dec 13, 2021

0.9.9.9.9.5

Dec 13, 2021

0.9.9.9.9.4

Dec 13, 2021

0.9.9.9.9.3

Dec 13, 2021

0.9.9.9.9.2

Dec 13, 2021

0.9.9.9.9.1

Dec 13, 2021

0.9.9.9.9

Dec 13, 2021

0.9.9.9.8

Dec 13, 2021

0.9.9.9.7

Dec 13, 2021

0.9.9.9.6

Dec 13, 2021

0.9.9.9.5

Dec 13, 2021

0.9.9.9.4

Dec 13, 2021

0.9.9.9.3

Dec 13, 2021

0.9.9.9.2

Dec 13, 2021

0.9.9.9.1

Dec 13, 2021

0.9.9.9

Dec 13, 2021

0.9.9.8

Dec 13, 2021

0.9.9.7

Dec 13, 2021

0.9.9.6

Dec 13, 2021

0.9.9.5

Dec 13, 2021

0.9.9.4

Dec 13, 2021

0.9.9.3

Dec 13, 2021

0.9.9.2

Dec 13, 2021

0.9.9.1

Dec 13, 2021

0.9.9

Dec 13, 2021

0.9.8

Dec 13, 2021

0.9.7

Dec 13, 2021

0.9.4

Dec 10, 2021

0.9.3

Dec 9, 2021

0.9.2

Nov 14, 2021

0.9.1

Nov 12, 2021

0.8.5

Nov 11, 2021

0.8.3

Nov 11, 2021

0.8.2

Nov 11, 2021

0.8.1

Nov 9, 2021

0.6.1

Oct 16, 2021

0.5.1

Oct 10, 2021

0.5.0

Oct 10, 2021

0.4.3

Sep 15, 2021

0.3.3

Jun 19, 2020

0.3.2

Jun 19, 2020

0.3.1

May 10, 2020

0.3.0

May 9, 2020

0.2.1

May 5, 2020

0.2.0

May 2, 2020

0.1.1

Apr 26, 2020

0.1.0

Apr 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

global_chem-1.4.0.tar.gz (80.1 kB view hashes)

Uploaded Feb 16, 2022 Source

Hashes for global_chem-1.4.0.tar.gz

Hashes for global_chem-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`2193703eff1ba1d0e8be1617d755d03076995369873d500dfed63a05ac31f6af`
MD5	`8db30ff537496bce4e2b14194266a3db`
BLAKE2b-256	`34db27706275057dbfb0dac3a71dea127dd66ef03f2cedde27e94183009b30b5`