Useful AI library for chemistry.

These details have not been verified by PyPI

Project links

Project description

TensorFlow Keras

KME Word Segmentation

AI for tokenize chemical IUPAC name using tensorflow and keras.

Prepare training dataset

What to have

CHAR_INDICES: dictionary with key is character [string], value is number [int] (use to preprocess text to number)
Dict_cut: Input text with determined (by '|' ) where to be cut (use for create label)

Cyclo|prop|ane| |Non|a|-|1|,|8|-|di|yne| 
|1|,|3|-|di|chlorocyclo|hex|ane| |Hept|a|-|1|,|5|-|di|ene

Dict: Input text (raw) (use for train model)

Cyclopropane Nona-1,8-diyne 1,3-dichlorocyclohexane Hepta-1,5-diene

Create dataset

Make JSON value as array of chemical name (dataset, dataset_cut)
Split array for training dataset (90%) (dataset_train, dataset_cut_train) and validation dataset (10%) (dataset_val, dataset_cut_val)
Join item in each array together into text
Create dataset using create_dataset function that take dataset_cut then return X_train (size: [text_length, look_back]) (dataset_cut that have been cut '|') and label (position where to cut 1 = cut, 0 = not cut)
Use tf.data.Dataset.from_tensor_slices((X, y)).batch_size(128) to make data easy to be train

Create Model

We use 1xEmbedding layer, 1xBidirection LSTM layer, Dense Layer
Compiled model optimizer = Adam, loss_function = Categorical Crossentropy (becase we classify 2 label output 1 = cut, 0 = not cut) call_back = [EarlyStopping, ModelCheckpoint] Early stopping : Stop train model if validation_loss is being increase ModelCheckpoint : Save model that has minimum validation_loss

After Train Model

The output of the model is array (size: [batch_size, 2] determined which position to be cut (value = 0 -> not cut ; 1 -> cut))

[1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 0 1 0 0]

Tokenize dataset (text which didn't determined where to be cut) with label (output from model) using word_tokenize function that return array of text that has been cut

['1', ',', '2', '-', 'di', 'h', 'ydrox', 'y', '-', '2', '-', 'meth', 'yl', 'prop', 'ane']

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Nov 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

kmeseg-0.1.1-py3-none-any.whl (6.7 kB view details)

Uploaded Nov 6, 2021 Python 3

File details

Details for the file kmeseg-0.1.1-py3-none-any.whl.

File metadata

Download URL: kmeseg-0.1.1-py3-none-any.whl
Upload date: Nov 6, 2021
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for kmeseg-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c015c6bda2a16fda6e051621a8b8410652c56635d51ee692045899252da7335f`
MD5	`475ee1eedb918f0aa03df829fadd62c7`
BLAKE2b-256	`d0d2f365af9b3656267cbf9345d6e0bda80bedfe75b8fb26204e54c7933b9ea3`

See more details on using hashes here.

kmeseg 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

KME Word Segmentation

Prepare training dataset

What to have

Create dataset

Create Model

After Train Model

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes