Skip to main content

A package for fitting simple categorical mixture models to sequence data

Project description

categorical_mix

Fast, scalable clustering for fixed length sequences with a simple generative model.

This package is a fairly special-purpose tool designed for fitting multiple sequence alignments of protein or DNA sequences to a categorical mixture model. (It's possible you could use this for other tasks, although that's a possibility we've never investigated.) This is a very simple model but for precisely this reason it can sometimes be quite useful -- it's fully human-interpretable, easy to visualize and can fit a few million sequences very quickly. It's designed to fit datasets too large to fit in memory.

This package is primarily used by AntPack, which uses it to score antibody sequences for human-likeness and for other tasks. If you are interested in using it for some other task, for installation and usage, see the docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

categorical_mix-0.1.0.tar.gz (68.7 kB view details)

Uploaded Source

File details

Details for the file categorical_mix-0.1.0.tar.gz.

File metadata

  • Download URL: categorical_mix-0.1.0.tar.gz
  • Upload date:
  • Size: 68.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for categorical_mix-0.1.0.tar.gz
Algorithm Hash digest
SHA256 80523608e8aa5a3178020f1710eee44f43aba07029dc7d0820220b42a64a2d78
MD5 1184ca400cdd9605bc01571b45b529d9
BLAKE2b-256 45c583e0b3059f7291f79c4b062b2b8006391104ea7586d84c0d4cdfc442cc1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page