Skip to main content

No project description provided

Project description

serde_mol2

Python/Rust module for mol2 format (de)serialization

Installation

Install from PyPi (required python >= 3.8):

pip install serde-mol2

After that:

-> python3
Python 3.9.5 (default, Jun  4 2021, 12:28:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import serde_mol2
>>> m = serde_mol2.read_file('example.mol2')
>>> m
[<builtins.Mol2 object at 0x7f6da9ebcae0>]

Or using a binary:

-> serde-mol2 -h
serde-mol2 0.2.2
CSC - IT Center for Science Ltd. (Jaroslaw Kalinowski <jaroslaw.kalinowski@csc.fi>)

USAGE:
    serde-mol2 [OPTIONS]

OPTIONS:
    -a, --append                       Append to mol2 files when writing rather than truncate
    -c, --compression <COMPRESSION>    Level of compression for BLOB data, 0 means no compression
                                       [default: 3]
        --comment <COMMENT>            Comment to add/filter to/by the molecule comment field
        --desc <DESC>                  Description to add/filter to/by entries when writing to the
                                       database
        --filename-desc                Add filename to the desc field when adding a batch of files
                                       to the database
    -h, --help                         Print help information
    -i, --input <INPUT_FILE>...        Input mol2 file
        --limit <LIMIT>                Limit the number of structures retrieved from the database.
                                       Zero means no limit. [default: 0]
        --list-desc                    List available row descriptions present in the database
        --no-shm                       Do not try using shm device when writing to databases
    -o, --output <OUTPUT_FILE>         Output mol2 file
        --offset <OFFSET>              Offset when limiting the number of structures retrieved from
                                       the database. Zero means no offset. [default: 0]
    -s, --sqlite <SQLITE_FILE>         Sqlite database file
    -V, --version                      Print version information

Usage a.k.a. quick function reference

class Mol2

  • Mol2.to_json()

    Return a JSON string for a Mol2 object.

  • Mol2.as_string()

    Return a mol2 string for a Mol2 object.

  • Mol2.write_mol2( filename, append=False )

    Write Mol2 object to a mol2 file.

  • Mol2.serialized()

    Return a Mol2 object in a python serialized form.

Functions

  • write_mol2( list, filename, append=False )

    list is a list of Mol2 objects. Functions writes all structures in the list into a mol2 file named filename.

  • db_insert( list, filename, compression=3, shm=True )

    Insert vector of structures into a database. Append if the database exists.

    Input:

    • list: vector of structures
    • filename: path to the database
    • compression: compression level
    • shm: should be try and use a database out from a temporary location?
  • read_db_all( filename, shm=False, desc=None, comment=None, limit=0, offset=0 )

    Read all structures from a database and return as a vector

    Input:

    • filename: path to the database
    • shm: should we try and use the database out of a temporary location?
    • desc: return only entries containing desc in the desc field
    • comment: return only entries containing comment in the molecule comment
    • limit: Limit the number of structures retrieved from the database and zero means no limit
    • _offset: Offset when limiting the number of structures retrieved from the database and zero means no offset
  • read_db_all_serialized( filename, shm=True, desc=None, comment=None, limit=0, offset=0 )

    Read all structures from a database and return as a vector, but keep structures in a serialized python form rather than binary.

    Input:

    • filename: path to the database
    • shm: should we try and use the database out of a temporary location?
    • desc: return only entries containing desc in the desc field
    • comment: return only entries containing comment in the molecule comment
    • limit: Limit the number of structures retrieved from the database and zero means no limit
    • _offset: Offset when limiting the number of structures retrieved from the database and zero means no offset
  • read_file_to_db( filename, db-filename, compression=3, shm=True , desc=None, comment=None )

    Convenience function. Read structures from a mol2 file and write directly to the database.

    Input:

    • filename: path to the mol2 file
    • db-filename: path to the database
    • compression: compression level
    • shm: should we use the database out of a temporary location?
    • desc: add this description to structures read
    • comment: add this comment to the molecule comment field
  • read_file_to_db_batch( filenames, db-filename, compression=3, shm=True, desc=None, comment=None )

    Convenience function. Read structures from a set of files directly into the database.

    Input:

    • filenames: vector of paths to mol2 files
    • db-filename: path to the database
    • compression: compression level
    • shm: should we use the database out of a temporary location?
    • desc: add this description to structures read
    • comment: add this comment to the molecule comment field
  • read_file( filename, desc=None, comment=None )

    Read a mol2 file and return a vector of structures

    Input:

    • filename: path to the mol2 file
    • desc: add this description to structures read
    • comment: add this comment to the molecule comment field
  • read_file_serialized( filename, desc=None, comment=None )

    Read a mol2 file and return a vector of structures, but serialized python structures rather than a binary form.

    Input:

    • filename: path to the mol2 file
    • desc: add this description to structures read
    • comment: add this comment to the molecule comment field
  • desc_list( filename, shm=False )

    List unique entry descriptions found in a database.

    Input:

    • filename: path to a database
    • shm: should we use the database out of a temporary location?

Notes

Compression

Compression applies to sections other than MOLECULE. Those sections are stored in the database in a binary form (BLOB) as those sections contain multiple rows. Since it is not human readable it makes sense to apply at least some compression. The algorithm of choice currently is zstd. Default level of compression here is 3. However, by default, for zstd compression 0 means default level of compression, but in this module compression level 0 means no compression.

At the time of writing the overhead that comes from (de)compressing the data is negligible compared to IO/CPU cost of rw and parsing.

SHM

When writing to the database we are writing just one row at a time. On shared filesystems writing like that is very slow. When using shm functionality the module tries to copy the database to /dev/shm and use it there, essentially performing all operations in-memory. However, this means that file in the original location is essentially not usable by other processes as it will be overwritten at the end.

Another problem with doing things in /dev/shm is that if the database is too big, we can run out of space. So make sure your database fits into memory available.

In the future there will be an option to choose a different TMPDIR than /dev/shm, for example one that points to a fast NVMe storage.

By default shm is used only when writing to the database, as reading seems to not be affected so much.

Limitations

The biggest limitation at the moment is that only the following sections are read:

  • MOLECULE
  • ATOM
  • BOND
  • SUBSTRUCTURE

All other sections are currently just dropped silently.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

serde_mol2-0.2.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

serde_mol2-0.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

serde_mol2-0.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

File details

Details for the file serde_mol2-0.2.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for serde_mol2-0.2.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e5cb26c6b3e72456d89c69ee1f4e281c760a5bfd1c4c035dfb5a00fbebd548f0
MD5 7fcff589c3d940739e77d2b80e27895b
BLAKE2b-256 973598f9e7c0a04e5ce7c3dc67a196e6e97b31b6f72cb0fe8b58e99fee701dc6

See more details on using hashes here.

File details

Details for the file serde_mol2-0.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for serde_mol2-0.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 13e31c0f22510997968e42701354d3c1690e7b71d3c41ab0c24366bf46017627
MD5 5f345cf97419dc7b59ff61679352e81e
BLAKE2b-256 9d26e91208885964ff3b27ae882e20d14cf63c10da4a88414e2287026e2f034a

See more details on using hashes here.

File details

Details for the file serde_mol2-0.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for serde_mol2-0.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 348fee2481074456247af31294478496a198ab3677a517636f497cd9310607c7
MD5 ee5714944913101f9cf1173bb9cc84c3
BLAKE2b-256 704e997663a4f3d32a7afc85a57c9be1debc0563abaeda6d366d41390d6b625f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page