Annotation Assisted Direct Coupling Analysis
Project description
ANNotation Assisted Direct Coupling Analisis (annaDCA)
This package contains the methods and scripts to train and sample an RBM model provided with data annotations.
⬇️ Installation
Option 1: from PyPI
python -m pip install annadca
Option 2: cloning the repository
git clone https://github.com/rossetl/annaDCA.git
cd annaDCA
python -m pip install .
📘 Usage
After installation, all the main routines can be launched through the command-line interface using the command annadca. For now, only the train routine is implemented
To see all the training options do
annadca train -h
✔️ Input data format
The training routine requires three types of information: the training sequences, their identifiers and the assocated annotations.
The package supports both binary variables and categorical variables (e.g. amino acid sequences).
These input data can be provided to the routine in two different ways:
- By providing a
csvfile to the-dargument that contains identifiers, sequences and annotations. By default, the routine will check for the columnsname,sequenceandlabel, but the user can specify different names using the arguments--column_names,--column_sequencesand--column_labels.
[!WARNING] The previous method only works for categorical variables. For binary variables, use the following option
- By providing a
fastafile (categorical variables) or a plain text file (binary variables) to the-dargument and an additionalcsvfile to the-aargument containing the sequence identifiers and the annotations. Importantly, the sequence identifiers must match with those that are present in thefastafile.
To train the model with default arguments and a single csv file, use
annadca train -d <path_data> -o <output_directory> -l <model_tag>
Sequence data format
The model supports the following input data format:
- Binary variables: plain text format. Each row is one data sample, variables are separated by white spaces
- Categorical variables:
- fasta format. Each data poin is a sequence of tokens with an header on top. The header row starts with
>. csvformat: the file must contain one column with the aligned training sequences.
- fasta format. Each data poin is a sequence of tokens with an header on top. The header row starts with
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file annadca-0.2.0.tar.gz.
File metadata
- Download URL: annadca-0.2.0.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53c568822955d4b37f048b239d4298b9a12c6c86ea7e84190a5bf59862c0c9a7
|
|
| MD5 |
312c48f12a00d22b910f1e0cc26c588a
|
|
| BLAKE2b-256 |
f982be4dc198860cd3cbb1bc0b2e3d3273936259861aab30029ad644b491f411
|
File details
Details for the file annadca-0.2.0-py3-none-any.whl.
File metadata
- Download URL: annadca-0.2.0-py3-none-any.whl
- Upload date:
- Size: 41.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2253626327184ecce8a8f230fc8adf8dded7ffb8a9634ce93d99e4e8ed9829f0
|
|
| MD5 |
54e2043c18a1eb9c292e180e5937682a
|
|
| BLAKE2b-256 |
336712d15d745fafa803367c243eacfabf18bb8a1843bdc38a15f186772579c2
|