Python implementation of autoregressive Direct Coupling Analysis

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

arDCA

auto-regressive Direct Coupling Analysis (arDCA) 2.0.

Overview

This package is the GPU-accelerated version of the original version that can be found at ArDCA.jl. The current implementation also aims at providing a user-friendly command line interface for training and sampling from an autoregressive DCA model.

Installation

During the installation, arDCA will also install adabmDCA and all its dependencies.

Option 1:

Install the package via PyPl:

python -m pip install arDCA

Option 2

Clone this repository:

git clone https://github.com/spqb/arDCA.git
cd arDCA
python -m pip install .

Using the package

We provide a Colab notebook where it is shown hot to train and sample an arDCA model using RNA sequences.

Alternatively, one can install the package locally and run from the command line one of the two implemented routines:

Train arDCA from the command line

Once installed, you can launch the package routing by using the command arDCA. All the training options can be listed via

arDCA train -h

To launch a training with default arguments, use

arDCA train -d <path_data> -o <output_folder> -l <label>

where path_data is the path to the input multi-sequence alignment in fasta format and label is an identifier for the output files. The parameters of the trained model are saved in the file output_folder/<label>_params.pth, and can be easily loaded afterwrds using the Pytorch methods.

By default, the program assumes that the input data are protein sequences. If you want to use RNA sequences, you should use the argument --alphabet rna.

[!WARNING] Depending on the dataset, the default regularization parameters reg_h and reg_J may not work properly. If the training does not converge or the model's generation capabilities are poor, you may want to increase these values.

Sample arDCA from the command line

To generate new sequences using the command line, the minimal input command is

arDCA sample -p <path_params> -o <output_folder> -l <label> --ngen <num_sequences>

where num_sequences is the number of sequences to be generated. The output will be saved in fasta format at output_folder/<label>_samples.fasta.

If the argument -d <path_data> is provided, the script will also compute the Pearson correlation coefficient and the slope between the two-sites correlation matrix of the data and the generated samples.

License

This package is open-sourced under the Apache License 2.0.

Citation

If you use this package in your research, please cite

Trinquier, J., Uguzzoni, G., Pagnani, A. et al. Efficient generative modeling of protein sequences using simple autoregressive models. Nat Commun 12, 5800 (2021).

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.1

Feb 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ardca-0.1.1.tar.gz (14.8 kB view details)

Uploaded Feb 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arDCA-0.1.1-py3-none-any.whl (15.3 kB view details)

Uploaded Feb 13, 2025 Python 3

File details

Details for the file ardca-0.1.1.tar.gz.

File metadata

Download URL: ardca-0.1.1.tar.gz
Upload date: Feb 13, 2025
Size: 14.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for ardca-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`fbb26f6c7eb9415c30cd7a28b69ccf1d23b360f5ea680bcfb89bba2fc211afff`
MD5	`e5d1c5b4437ada2adf4abbb243943e01`
BLAKE2b-256	`a0185414b523d853bc2fe3fa7cbb62618b1c337ea463a75709a8377e8caf66bc`

See more details on using hashes here.

File details

Details for the file arDCA-0.1.1-py3-none-any.whl.

File metadata

Download URL: arDCA-0.1.1-py3-none-any.whl
Upload date: Feb 13, 2025
Size: 15.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for arDCA-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ecb9da023b2a56c441247d1b8738a684fae66546a52b03c4255eb1c47f824f7e`
MD5	`3c34959d394fe4f64a8ce1e8286a6530`
BLAKE2b-256	`a094b803adbc7927b31d6c7bbbd07391f8ced69454cfc2428276c51e26c2c6a9`

See more details on using hashes here.

arDCA 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

arDCA

Overview

Installation

Option 1:

Option 2

Using the package

Train arDCA from the command line

Sample arDCA from the command line

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes