Skip to main content

Annotation Assisted Direct Coupling Analysis

Project description

ANNotation Assisted Direct Coupling Analisis (annaDCA)

This package contains the methods and scripts to train and sample an RBM model provided with data annotations.

RBM Model with Annotations

⬇️ Installation

Option 1: from PyPI

python -m pip install annadca

Option 2: cloning the repository

git clone https://github.com/rossetl/annaDCA.git
cd annaDCA
python -m pip install .

📘 Usage

After installation, all the main routines can be launched through the command-line interface using the command annadca. For now, only the train routine is implemented To see all the training options do

annadca train -h

✔️ Input data format

The training routine requires three types of information: the training sequences, their identifiers and the assocated annotations.

The package supports both binary variables and categorical variables (e.g. amino acid sequences).

These input data can be provided to the routine in two different ways:

  • By providing a csv file to the -d argument that contains identifiers, sequences and annotations. By default, the routine will check for the columns name, sequence and label, but the user can specify different names using the arguments --column_names, --column_sequences and --column_labels.

[!WARNING] The previous method only works for categorical variables. For binary variables, use the following option

  • By providing a fasta file (categorical variables) or a plain text file (binary variables) to the -d argument and an additional csv file to the -a argument containing the sequence identifiers and the annotations. Importantly, the sequence identifiers must match with those that are present in the fasta file.

To train the model with default arguments and a single csv file, use

annadca train -d <path_data> -o <output_directory> -l <model_tag>

Sequence data format

The model supports the following input data format:

  • Binary variables: plain text format. Each row is one data sample, variables are separated by white spaces
  • Categorical variables:
    • fasta format. Each data poin is a sequence of tokens with an header on top. The header row starts with >.
    • csv format: the file must contain one column with the aligned training sequences.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annadca-0.2.0.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

annadca-0.2.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file annadca-0.2.0.tar.gz.

File metadata

  • Download URL: annadca-0.2.0.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for annadca-0.2.0.tar.gz
Algorithm Hash digest
SHA256 53c568822955d4b37f048b239d4298b9a12c6c86ea7e84190a5bf59862c0c9a7
MD5 312c48f12a00d22b910f1e0cc26c588a
BLAKE2b-256 f982be4dc198860cd3cbb1bc0b2e3d3273936259861aab30029ad644b491f411

See more details on using hashes here.

File details

Details for the file annadca-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: annadca-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for annadca-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2253626327184ecce8a8f230fc8adf8dded7ffb8a9634ce93d99e4e8ed9829f0
MD5 54e2043c18a1eb9c292e180e5937682a
BLAKE2b-256 336712d15d745fafa803367c243eacfabf18bb8a1843bdc38a15f186772579c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page