Skip to main content

Self-referencing embedded strings

Project description

SELFIES

SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs. It is based on a Chomsky type-2 grammar, augmented with two self-referencing functions. A main objective is to use SELFIES as direct input into machine learning models, in particular in generative models, for the generation of graphs with high semantical and syntactical validity.

See the paper at arXiv: https://arxiv.org/abs/1905.13741

The code presented here is a concrete application of SELFIES in chemistry, for the robust representation of molecule. We show the encoding and decoding of three molecules from various databases, and the generation of a new, random molecule with high semantical and syntactical validity.

Installation

You can install SELFIES via

pip install selfies

Examples

Several examples can be seen in examples/selfies_example.py. Here is a simple encoding and decoding:

from selfies import encoder, decoder

test_molecule1='CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1' # non-fullerene acceptors for organic solar cells
selfies1=encoder(test_molecule1)
smiles1=decoder(selfies1)

print('test_molecule1: '+test_molecule1+'\n')
print('selfies1: '+selfies1+'\n')
print('smiles1: '+smiles1+'\n')
print('equal: '+str(test_molecule1==smiles1)+'\n\n\n')
  • an example of SELFIES in a generative model can be seen in the directory 'VariationalAutoEncoder_with_SELFIES'. There, SMILES datasets are automatically translated into SELFIES, and used for training of a variational autoencoder (VAE).

Python version

fully tested with Python 3.7.1 on

supported:

  • Python 3.7.2, 3.7.1, 3.6.8, 3.6.7, 2.7.15

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selfies-0.2.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selfies-0.2.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file selfies-0.2.1.tar.gz.

File metadata

  • Download URL: selfies-0.2.1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for selfies-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0e49a0608a9bf7910dfd867e0b4b72e4a82daf5be0d025cfadcaaac60ceae3ea
MD5 50bf435f5e03b71500d31b6304499186
BLAKE2b-256 886169d3e5c68b4c53976b8bebdac91eee237582ad728cc609de8591d4b970c1

See more details on using hashes here.

File details

Details for the file selfies-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: selfies-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for selfies-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 946746cc24ff434793ff5f5231eb2468aeb195ea3df5a89a6571b9995bf1cabf
MD5 b4333b9547b717334ec79a2d9fb0e445
BLAKE2b-256 8a041d10f76ee389d2616600d10e91d9bb18176d7cf14e1f13836a4f125d99eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page