Skip to main content

Annotation Assisted Direct Coupling Analysis

Project description

ANNotation Assisted Direct Coupling Analisis (annaDCA)

This package contains the methods and scripts to train and sample an RBM model provided with data annotations.

RBM Model with Annotations

⬇️ Installation

Option 1: from PyPI

python -m pip install annadca

Option 2: cloning the repository

git clone https://github.com/rossetl/annaDCA.git
cd annaDCA
python -m pip install .

📘 Usage

After installation, all the main routines can be launched through the command-line interface using the command annadca. For now, only the train routine is implemented To see all the training options do

annadca train -h

✔️ Input data format

The training routine requires three types of information: the training sequences, their identifiers and the assocated annotations.

The package supports both binary variables and categorical variables (e.g. amino acid sequences).

These input data can be provided to the routine in two different ways:

  • By providing a csv file to the -d argument that contains identifiers, sequences and annotations. By default, the routine will check for the columns name, sequence and label, but the user can specify different names using the arguments --column_names, --column_sequences and --column_labels.

[!WARNING] The previous method only works for categorical variables. For binary variables, use the following option

  • By providing a fasta file (categorical variables) or a plain text file (binary variables) to the -d argument and an additional csv file to the -a argument containing the sequence identifiers and the annotations. Importantly, the sequence identifiers must match with those that are present in the fasta file.

To train the model with default arguments and a single csv file, use

annadca train -d <path_data> -o <output_directory> -l <model_tag>

Sequence data format

The model supports the following input data format:

  • Binary variables: plain text format. Each row is one data sample, variables are separated by white spaces
  • Categorical variables:
    • fasta format. Each data poin is a sequence of tokens with an header on top. The header row starts with >.
    • csv format: the file must contain one column with the aligned training sequences.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annadca-0.2.1.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

annadca-0.2.1-py3-none-any.whl (44.8 kB view details)

Uploaded Python 3

File details

Details for the file annadca-0.2.1.tar.gz.

File metadata

  • Download URL: annadca-0.2.1.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for annadca-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9883fa17f43273179753f0fa7956f11bc6033a66e6b6713d316b0a731f79dd33
MD5 ca51ea5d806a95d6690aa0602511a8e3
BLAKE2b-256 bd0ba30547fda493cc88d750cdba67318b62763da50c1d4b638552535111d848

See more details on using hashes here.

File details

Details for the file annadca-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: annadca-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 44.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for annadca-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf4c98a731f2405fe8b3c0b5cc29a046fd2a8713aebe204e54ac65bebf185115
MD5 053737d20d3311e84a6a3fb3d6ee9bc2
BLAKE2b-256 b1d70ff983afa5afb6c1113276b770bf5e6b37af987cd29b06e90ced30338939

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page