A package for fitting simple categorical mixture models to sequence data
Project description
categorical_mix
Fast, scalable clustering for fixed length sequences with a simple generative model.
This package is a fairly special-purpose tool designed for fitting multiple sequence alignments of protein or DNA sequences to a categorical mixture model. (It's possible you could use this for other tasks, although that's a possibility we've never investigated.) This is a very simple model but for precisely this reason it can sometimes be quite useful -- it's fully human-interpretable, easy to visualize and can fit a few million sequences very quickly. It's designed to fit datasets too large to fit in memory.
This package is primarily used by AntPack, which uses it to score antibody sequences for human-likeness and for other tasks. If you are interested in using it for some other task, for installation and usage, see the docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file categorical_mix-0.2.0.0.tar.gz
.
File metadata
- Download URL: categorical_mix-0.2.0.0.tar.gz
- Upload date:
- Size: 68.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb8a6662e425a06e1e05b3773aca454bd9d95b0a1733769596cc9f5568145336 |
|
MD5 | 7d198d6f792aaf1044ea8e032e330487 |
|
BLAKE2b-256 | 9b47950130e163095b34f27de51f71ceadd8f6906da22729291cff50ae53bda9 |