No project description provided
Project description
serde_mol2
Python/Rust module for mol2 format (de)serialization
Installation
Install from PyPi (required python >= 3.8):
pip install serde-mol2
After that:
-> python3
Python 3.9.5 (default, Jun 4 2021, 12:28:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import serde_mol2
>>> m = serde_mol2.read_file('example.mol2')
>>> m
[<builtins.Mol2 object at 0x7f6da9ebcae0>]
Or using a binary:
-> serde-mol2 -h
serde-mol2 0.2.2
CSC - IT Center for Science Ltd. (Jaroslaw Kalinowski <jaroslaw.kalinowski@csc.fi>)
USAGE:
serde-mol2 [OPTIONS]
OPTIONS:
-a, --append Append to mol2 files when writing rather than truncate
-c, --compression <COMPRESSION> Level of compression for BLOB data, 0 means no compression
[default: 3]
--comment <COMMENT> Comment to add/filter to/by the molecule comment field
--desc <DESC> Description to add/filter to/by entries when writing to the
database
--filename-desc Add filename to the desc field when adding a batch of files
to the database
-h, --help Print help information
-i, --input <INPUT_FILE>... Input mol2 file
--limit <LIMIT> Limit the number of structures retrieved from the database.
Zero means no limit. [default: 0]
--list-desc List available row descriptions present in the database
--no-shm Do not try using shm device when writing to databases
-o, --output <OUTPUT_FILE> Output mol2 file
--offset <OFFSET> Offset when limiting the number of structures retrieved from
the database. Zero means no offset. [default: 0]
-s, --sqlite <SQLITE_FILE> Sqlite database file
-V, --version Print version information
Usage a.k.a. quick function reference
class Mol2
-
Mol2.to_json()
Return a
JSON
string for aMol2
object. -
Mol2.as_string()
Return a
mol2
string for aMol2
object. -
Mol2.write_mol2( filename, append=False )
Write
Mol2
object to amol2
file. -
Mol2.serialized()
Return a
Mol2
object in a python serialized form.
Functions
-
write_mol2( list, filename, append=False )
list is a list of
Mol2
objects. Functions writes all structures in the list into amol2
file named filename. -
db_insert( list, filename, compression=3, shm=True )
Insert vector of structures into a database. Append if the database exists.
Input:
- list: vector of structures
- filename: path to the database
- compression: compression level
- shm: should be try and use a database out from a temporary location?
-
read_db_all( filename, shm=False, desc=None, comment=None, limit=0, offset=0 )
Read all structures from a database and return as a vector
Input:
- filename: path to the database
- shm: should we try and use the database out of a temporary location?
- desc: return only entries containing desc in the desc field
- comment: return only entries containing comment in the molecule comment
- limit: Limit the number of structures retrieved from the database and zero means no limit
- _offset: Offset when limiting the number of structures retrieved from the database and zero means no offset
-
read_db_all_serialized( filename, shm=True, desc=None, comment=None, limit=0, offset=0 )
Read all structures from a database and return as a vector, but keep structures in a serialized python form rather than binary.
Input:
- filename: path to the database
- shm: should we try and use the database out of a temporary location?
- desc: return only entries containing desc in the desc field
- comment: return only entries containing comment in the molecule comment
- limit: Limit the number of structures retrieved from the database and zero means no limit
- _offset: Offset when limiting the number of structures retrieved from the database and zero means no offset
-
read_file_to_db( filename, db-filename, compression=3, shm=True , desc=None, comment=None )
Convenience function. Read structures from a mol2 file and write directly to the database.
Input:
- filename: path to the mol2 file
- db-filename: path to the database
- compression: compression level
- shm: should we use the database out of a temporary location?
- desc: add this description to structures read
- comment: add this comment to the molecule comment field
-
read_file_to_db_batch( filenames, db-filename, compression=3, shm=True, desc=None, comment=None )
Convenience function. Read structures from a set of files directly into the database.
Input:
- filenames: vector of paths to mol2 files
- db-filename: path to the database
- compression: compression level
- shm: should we use the database out of a temporary location?
- desc: add this description to structures read
- comment: add this comment to the molecule comment field
-
read_file( filename, desc=None, comment=None )
Read a mol2 file and return a vector of structures
Input:
- filename: path to the mol2 file
- desc: add this description to structures read
- comment: add this comment to the molecule comment field
-
read_file_serialized( filename, desc=None, comment=None )
Read a mol2 file and return a vector of structures, but serialized python structures rather than a binary form.
Input:
- filename: path to the mol2 file
- desc: add this description to structures read
- comment: add this comment to the molecule comment field
-
desc_list( filename, shm=False )
List unique entry descriptions found in a database.
Input:
- filename: path to a database
- shm: should we use the database out of a temporary location?
Notes
Compression
Compression applies to sections other than MOLECULE
. Those sections are stored in the database in a binary form (BLOB
) as those sections contain multiple rows. Since it is not human readable it makes sense to apply at least some compression. The algorithm of choice currently is zstd
. Default level of compression here is 3. However, by default, for zstd
compression 0 means default level of compression, but in this module compression level 0 means no compression.
At the time of writing the overhead that comes from (de)compressing the data is negligible compared to IO/CPU cost of rw and parsing.
SHM
When writing to the database we are writing just one row at a time. On shared filesystems writing like that is very slow. When using shm
functionality the module tries to copy the database to /dev/shm
and use it there, essentially performing all operations in-memory. However, this means that file in the original location is essentially not usable by other processes as it will be overwritten at the end.
Another problem with doing things in /dev/shm
is that if the database is too big, we can run out of space. So make sure your database fits into memory available.
In the future there will be an option to choose a different TMPDIR
than /dev/shm
, for example one that points to a fast NVMe
storage.
By default shm
is used only when writing to the database, as reading seems to not be affected so much.
Limitations
The biggest limitation at the moment is that only the following sections are read:
- MOLECULE
- ATOM
- BOND
- SUBSTRUCTURE
All other sections are currently just dropped silently.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file serde_mol2-0.2.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: serde_mol2-0.2.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5cb26c6b3e72456d89c69ee1f4e281c760a5bfd1c4c035dfb5a00fbebd548f0 |
|
MD5 | 7fcff589c3d940739e77d2b80e27895b |
|
BLAKE2b-256 | 973598f9e7c0a04e5ce7c3dc67a196e6e97b31b6f72cb0fe8b58e99fee701dc6 |
File details
Details for the file serde_mol2-0.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: serde_mol2-0.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.9, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13e31c0f22510997968e42701354d3c1690e7b71d3c41ab0c24366bf46017627 |
|
MD5 | 5f345cf97419dc7b59ff61679352e81e |
|
BLAKE2b-256 | 9d26e91208885964ff3b27ae882e20d14cf63c10da4a88414e2287026e2f034a |
File details
Details for the file serde_mol2-0.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: serde_mol2-0.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.8, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 348fee2481074456247af31294478496a198ab3677a517636f497cd9310607c7 |
|
MD5 | ee5714944913101f9cf1173bb9cc84c3 |
|
BLAKE2b-256 | 704e997663a4f3d32a7afc85a57c9be1debc0563abaeda6d366d41390d6b625f |