Skip to main content

Foldcomp compresses protein structures with torsion angles effectively. It compresses the backbone atoms to 8 bytes and the side chain to additionally 4-5 byes per residue, an averaged-sized protein of 350 residues requires ~4.2kb. Foldcomp is a C++ library with Python bindings.

Project description

Foldcomp

Foldcomp compresses protein structures with torsion angles effectively. It compresses the backbone atoms to 8 bytes and the side chain to additionally 4-5 byes per residue, thus an averaged-sized protein of 350 residues requires ~6kb.

Left panel: Foldcomp data format, saving amino acid residue in 13 byte. Top right panel:  Foldcomp decompression is as fast as gzip. Bottom right panel: Foldcomp compression ratio is higher than pulchra and gzip.

Foldcomp is a compression method and format to compress protein structures requiring only 13 bytes per residue, which reduces the required storage space by an order of magnitude compared to saving 3D coordinates directly. We achieve this reduction by encoding the torsion angles of the backbone as well as the side-chain angles in a compact binary file format, FCZ.

Foldcomp currently only supports compression of single chain PDB files

Usage

Installing Foldcomp

# Install Foldcomp Python package
pip install foldcomp

# Download static binaries for Linux
wget https://mmseqs.com/foldcomp/foldcomp-linux-x86_64.tar.gz

# Download static binaries for Linux (ARM64)
wget https://mmseqs.com/foldcomp/foldcomp-linux-arm64.tar.gz

# Download binary for macOS
wget https://mmseqs.com/foldcomp/foldcomp-macos-universal.tar.gz

Executable

# Compression
foldcomp compress <pdb_file|cif_file> [<fcz_file>]
foldcomp compress [-t number] <pdb_dir|cif_dir> [<fcz_dir>]

# Decompression
foldcomp decompress <fcz_file> [<pdb_file>]
foldcomp decompress [-t number] <fcz_dir> [<pdb_dir>]

# Extraction of sequence or pLDDT
foldcomp extract [--plddt|--fasta] <fcz_file> [<txt_file|fasta_file>]
foldcomp extract [--plddt|--fasta] [-t number] <fcz_dir|tar> [<output_dir>]

# Check
foldcomp check <fcz_file>
foldcomp check [-t number] <fcz_dir|tar>

# RMSD
foldcomp rmsd <pdb1|cif1> <pdb2|cif2>

# Options
 -h, --help           print this help message
 -t, --threads        threads for (de)compression of folders/tar files [default=1]
 -a, --alt            use alternative atom order [default=false]
 -b, --break          interval size to save absolute atom coordinates [default=200]
 -z, --tar            save as tar file [default=false]
 --plddt              extract pLDDT score (only for extraction mode)
 --fasta              extract amino acid sequence (only for extraction mode)
 --no-merge           do not merge output files (only for extraction mode)

Python API

You can find more in-depth examples of using Foldcomp's Python interface in the example notebook: Open In Colab

import foldcomp
# 01. Handling a FCZ file
# Open a fcz file
with open("test/compressed.fcz", "rb") as fcz:
  fcz_binary = fcz.read()

  # Decompress
  (name, pdb) = foldcomp.decompress(fcz_binary) # pdb_out[0]: file name, pdb_out[1]: pdb binary string

  # Save to a pdb file
  with open(name, "w") as pdb_file:
    pdb_file.write(pdb)

# 02. Iterate over a database of FCZ files
# Open a foldcomp database
ids = ["d1asha_", "d1it2a_"]
with foldcomp.open("test/example_db", ids=ids) as db:
  # Iterate through database
  for (name, pdb) in db:
      # save entries as seperate pdb files
      with open(name + ".pdb", "w") as pdb_file:
        pdb_file.write(pdb)

Contributor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foldcomp-0.0.2.tar.gz (18.3 kB view hashes)

Uploaded Source

Built Distributions

foldcomp-0.0.2-cp311-cp311-win_amd64.whl (137.4 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

foldcomp-0.0.2-cp311-cp311-musllinux_1_1_x86_64.whl (765.4 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

foldcomp-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.7 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

foldcomp-0.0.2-cp311-cp311-macosx_10_9_x86_64.whl (229.6 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

foldcomp-0.0.2-cp311-cp311-macosx_10_9_universal2.whl (436.1 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

foldcomp-0.0.2-cp310-cp310-win_amd64.whl (137.4 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

foldcomp-0.0.2-cp310-cp310-musllinux_1_1_x86_64.whl (765.4 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

foldcomp-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.7 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

foldcomp-0.0.2-cp310-cp310-macosx_10_9_x86_64.whl (229.6 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

foldcomp-0.0.2-cp310-cp310-macosx_10_9_universal2.whl (436.1 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

foldcomp-0.0.2-cp39-cp39-win_amd64.whl (137.4 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

foldcomp-0.0.2-cp39-cp39-musllinux_1_1_x86_64.whl (765.4 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

foldcomp-0.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.7 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

foldcomp-0.0.2-cp39-cp39-macosx_10_9_x86_64.whl (229.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

foldcomp-0.0.2-cp39-cp39-macosx_10_9_universal2.whl (436.1 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

foldcomp-0.0.2-cp38-cp38-win_amd64.whl (137.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

foldcomp-0.0.2-cp38-cp38-musllinux_1_1_x86_64.whl (765.4 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

foldcomp-0.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.7 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

foldcomp-0.0.2-cp38-cp38-macosx_10_9_x86_64.whl (229.6 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

foldcomp-0.0.2-cp38-cp38-macosx_10_9_universal2.whl (436.1 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

foldcomp-0.0.2-cp37-cp37m-win_amd64.whl (137.3 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

foldcomp-0.0.2-cp37-cp37m-musllinux_1_1_x86_64.whl (765.4 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

foldcomp-0.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.9 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

foldcomp-0.0.2-cp37-cp37m-macosx_10_9_x86_64.whl (229.7 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page