Skip to main content

Explainable prediction of EC numbers using a multilayer perceptron.

Project description

:anchor: Theia

Quickstart

As you need at least Python 3.9 to get started, I suggest you use conda to create an environment with an up-to-date Python versions (3.11 is really, really fast, so I suggest using this as soon as rdkit supports it). For now, let's go with Python 3.10: conda create -n theia python==3.10 && conda activate theia is all you need (ha). Then you can go ahead and install theia using pip (theia was taken, so make sure to install theia-pypi, except if you want to parse log files):

pip install theia-pypi

Thats pretty much it, now you can start theia by simply typing:

theia

and open the url http://127.0.0.1:5000/ in your web browser.

In case you don't want or need an UI, you can also use the cli to simply predict an EC number from an arbitrary reaction:

theia-cli "rheadb.ec123" "S=C=NCC1=CC=CC=C1>>N#CSCC1=CC=CC=C1"

If you want a bit more information than just the predicted EC class, you can also get the top-5 probabilities:

theia-cli "rheadb.ec123" "S=C=NCC1=CC=CC=C1>>N#CSCC1=CC=CC=C1" --probs

Or, if you want human-readable output, you can make it pretty:

theia-cli "rheadb.ec123" "S=C=NCC1=CC=CC=C1>>N#CSCC1=CC=CC=C1" --probs --pretty

and you'll get a neat table...

┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Prediction ┃ Probability [%] ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 2.7.4      │           14.22 │
│ 2.3.2      │           11.03 │
│ 2.3.1      │            7.15 │
│ 2.7.8      │            4.62 │
│ 2.6.1      │            4.05 │
└────────────┴─────────────────┘

Of course, there are more models than rhea.ec123, which we used in the previous examples. Here's a complete list of all the included models:

Model Trained on Name
Rhea ECX Rhea rheadb.ec1
Rhea ECXY Rhea rheadb.ec12
Rhea ECXYZ Rhea rheadb.ec123
ECREACT ECX ECREACT 1.0 ecreact.ec1
ECREACT ECXY ECREACT 1.0 ecreact.ec12
ECREACT ECXYZ ECREACT 1.0 ecreact.ec123

Reproduction / Custom Models

To get started, install the reproduction requirements with:

pip install -r reproduction_requirements.txt

The training, validation, and test sets used in the manuscript can be recreated using the following two commands (of course, you can plug in your own data sets here to get a custom model):

mkdir experiments/data
python scripts/encode_split_data.py data/rheadb.csv.gz experiments/data/rheadb
python scripts/encode_split_data.py data/ecreact-nofilter-1.0.csv.gz experiments/data/ecreact

The training of the models can be started with:

chmod +x train_all.sh
./train_all.sh

If you want to train the 6 additional models for cross-validation, you can run the following:

chmod +x train_all_cross.sh
./train_all_cross.sh

Finally, to reproduce the figures, you first have to run some additional data crunching scripts:

python scripts/class_counts.py data/ecreact-nofilter-1.0.csv.gz experiments/data/ecreact_counts.csv
python scripts/class_counts.py data/rheadb.csv.gz experiments/data/rheadb_counts.csv

Then it's time to draw:

cd figures
chmod +x generate_figures.sh
./generate_figures.sh

fin.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theia-pypi-0.2.tar.gz (11.4 MB view details)

Uploaded Source

Built Distribution

theia_pypi-0.2-py3-none-any.whl (11.4 MB view details)

Uploaded Python 3

File details

Details for the file theia-pypi-0.2.tar.gz.

File metadata

  • Download URL: theia-pypi-0.2.tar.gz
  • Upload date:
  • Size: 11.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for theia-pypi-0.2.tar.gz
Algorithm Hash digest
SHA256 1e5b79d7a10ea7cfaf1eb7d78656628ecf5499ea119a273d5b3f9c177bcce399
MD5 14e9b13166a26d525f59da8575bc3317
BLAKE2b-256 27eb49c48e912f308d8e0c05abdbecf26c352497281c086f6e358bb98b24d3e1

See more details on using hashes here.

File details

Details for the file theia_pypi-0.2-py3-none-any.whl.

File metadata

  • Download URL: theia_pypi-0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for theia_pypi-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6d82f365dfa5e544d466161267cbf5730bf84a4433a588280e5e5e5193eec723
MD5 2332f5918afcca50df6e53e1c498e849
BLAKE2b-256 e295bc45b7b890bb698b7c97b480a822a443bd65b92461e047c987a672e2005a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page