Skip to main content

Chemical-Converters, developed by Knowledgator, showcases our technological capabilities in the chemical domain with entry-level models for a glimpse into potential applications. It's collection of tools for converting one chemical format into another. You can choose any model at HuggingFace trained to do specific convertion and setup your pipeline with the framework.

Project description

Chemical-Converters

Remember, chemistry is not just about reactions; it's about connections. Let's build those connections together! 💫

Visit our website Follow on LinkedIn Hugging Face Profile Follow on X Join our Discord Follow on Medium

Library for translating chemical names

Table of Contents

Introduction

Chemical-Converters serves as a foundational showcase of our technological capabilities within the chemical domain. The available models, which could be used in this library, represent our entry-level offerings, designed to provide a glimpse into the potential applications of our advanced solutions. For access to our comprehensive suite of larger and more precise models, we invite interested parties to e ngage directly with us.

Developed by the brilliant minds at Knowledgator, the library showcases the abilities of our chemical transformer models. Whether you're working on a research project, studying for an exam, or just exploring the chemical universe, Chemical-Converters is your go-to tool 🛠.

Models

The models` architecture is based on Google MT5 with certain modification to support different inputs and outputs. All available models are presented in the table:

Model Accuracy Size(MB) Task
SMILES2IUPAC-canonical-small 75% 24 SMILES to IUPAC
SMILES2IUPAC-canonical-base 86.9% 180 SMILES to IUPAC
IUPAC2SMILES-canonical-small 88.9% 24 IUPAC to SMILES
IUPAC2SMILES-canonical-base 93.7% 180 IUPAC to SMILES

also, you can check the most resent models within the library:

from chemicalconverters import NamesConverter

print(NamesConverter.available_models())
{'knowledgator/SMILES2IUPAC-canonical-small': 'Small model for converting canonical SMILES to IUPAC with accuracy 75%, does not support isomeric or isotopic SMILES', 'knowledgator/SMILES2IUPAC-canonical-base': 'Medium model for converting canonical SMILES to IUPAC with accuracy 87%, does not support isomeric or isotopic SMILES', 'knowledgator/IUPAC2SMILES-canonical-small': 'Small model for converting IUPAC to canonical SMILES with accuracy 89%, does not support isomeric or isotopic SMILES', 'knowledgator/IUPAC2SMILES-canonical-base': 'Medium model for converting IUPAC to canonical SMILES with accuracy 94%, does not support isomeric or isotopic SMILES'}

Quickstart

Firstly, install the library:

pip install chemical-converters

SMILES to IUPAC

You can choose pretrained model from table in the section "Models", but we recommend to use model "knowledgator/SMILES2IUPAC-canonical-base".

! Preferred IUPAC style

To choose the preferred IUPAC style, place style tokens before your SMILES sequence.

Style Token Description
<BASE> The most known name of the substance, sometimes is the mixture of traditional and systematic style
<SYST> The totally systematic style without trivial names
<TRAD> The style is based on trivial names of the parts of substances

To perform simple translation, follow the example:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO'))
print(converter.smiles_to_iupac(['<SYST>CCO', '<TRAD>CCO', '<BASE>CCO']))
['ethanol']
['ethanol', 'ethanol', 'ethanol']

Processing in batches:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac(["<BASE>C=CC=C" for _ in range(10)], num_beams=1, 
                                process_in_batch=True, batch_size=1000))
['buta-1,3-diene', 'buta-1,3-diene'...]

Validation SMILES to IUPAC translations

It's possible to validate the translations by reverse translation into IUPAC and calculating Tanimoto similarity of two molecules fingerprints.

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO', validate=True))
['ethanol'] 1.0

The larger is Tanimoto similarity, the more is probability, that the prediction was correct.

You can also process validation manually:

from chemicalconverters import NamesConverter

validation_model = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(NamesConverter.validate_iupac(input_sequence='CCO', predicted_sequence='ethanol', validation_model=validation_model))
1.0

!Note validation was not implemented in processing in batches.

IUPAC to SMILES

You can choose pretrained model from table in the section "Models", but we recommend to use model "knowledgator/IUPAC2SMILES-canonical-base".

To perform simple translation, follow the example:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(converter.iupac_to_smiles('ethanol'))
print(converter.iupac_to_smiles(['ethanol', 'ethanol', 'ethanol']))
['CCO']
['CCO', 'CCO', 'CCO']

Processing in batches:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(converter.iupac_to_smiles(["buta-1,3-diene" for _ in range(10)], num_beams=1, 
                                process_in_batch=True, batch_size=1000))
['<SYST>C=CC=C', '<SYST>C=CC=C'...]

Our models also predict IUPAC styles from the table:

Style Token Description
<BASE> The most known name of the substance, sometimes is the mixture of traditional and systematic style
<SYST> The totally systematic style without trivial names
<TRAD> The style is based on trivial names of the parts of substances

Citation

Coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemical_converters-0.1.2.tar.gz (59.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemical_converters-0.1.2-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file chemical_converters-0.1.2.tar.gz.

File metadata

  • Download URL: chemical_converters-0.1.2.tar.gz
  • Upload date:
  • Size: 59.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for chemical_converters-0.1.2.tar.gz
Algorithm Hash digest
SHA256 181eb6e864276a3a601933914496d4fbe05a97720dc2d6fccdb38e0d191ea59f
MD5 4afed9cac3353669845d63d6d363ca54
BLAKE2b-256 b503d042b0fe87980cbcb74694eb040bae630f8695450c837811965256983dcf

See more details on using hashes here.

File details

Details for the file chemical_converters-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for chemical_converters-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e711df939d39f3d191640b16c759f29fa7cdf6abf5729509f3d7459621e38ccf
MD5 1e9449d4f02c57b2b02ef233836a6434
BLAKE2b-256 c1217555e05e692545d6199a7bee2bcff6179f7b3654eb3f97fc243f8a9ce00f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page