Skip to main content

Deterministic vectorization of genomes.

Project description

🧬 vectome

GitHub Workflow Status (with branch) PyPI - Python Version PyPI

vectome is a python package for deterministic vectorization of genomes.

Installation

The easy way

You can install the precompiled version directly using pip.

$ pip install vectome

From source

Clone the repository, then cd into it. Then run:

$ pip install -e .

Command-line interface

vectome has a command-line interface.

$ vectome --help

You can generate vector embeddings by species / strain name or taxon ID.

$ vectome embed <(printf "Mycobacterium tuberculosis\n83333\nEscherichia coli CFT073")

The resulting vectors are based on MinHash sketches from sourmash, then folded into a 4096-vector using the CountSketch method. You can make a shorter vector using e.g. -n 1024.

You can also deterministically project into a dense vector.

$ vectome embed <(printf "Mycobacterium tuberculosis H37Rv") --projection 16

Change the seed with e.g. --seed 0.

If you need a more interpretable vector, you can generate one based on Jaccard distances to landmark species.

$ vectome embed <(printf "Mycobacterium tuberculosis H37Rv") --method landmark

Several landmark groups are available. You can set the group with --group 0, and get information about each one with vectome info.

$ vectome info
vectome version 0.0.1:
        group-0: {'landmarks': 113, 'manifest file': '.../vectome/vectome/data/landmarks/group-0/manifest.json', 'built': True}
        group-1: {'landmarks': 4, 'manifest file': '.../vectome/vectome/data/landmarks/group-1/manifest.json', 'built': True}
        group-2: {'landmarks': 1, 'manifest file': '.../vectome/vectome/data/landmarks/group-2/manifest.json', 'built': False}
        meta: {'cache location': '.../vectome/vectome/data/landmarks', 'cache exists': True}

Issues, problems, suggestions

Add to the issue tracker.

Documentation

(To come at ReadTheDocs.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectome-0.0.4.tar.gz (53.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectome-0.0.4-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file vectome-0.0.4.tar.gz.

File metadata

  • Download URL: vectome-0.0.4.tar.gz
  • Upload date:
  • Size: 53.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectome-0.0.4.tar.gz
Algorithm Hash digest
SHA256 7380062f74d29d874105fc710e139611ed45d3c5f7e00ad2043af5e8ae62f971
MD5 5e47e504089b4072649b085520526899
BLAKE2b-256 4d847253c3bf5eba1ff0401ede28e461f539304da82e5fcb21cea4831f3076e6

See more details on using hashes here.

File details

Details for the file vectome-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: vectome-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectome-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3e797a5b0601753d6ce26fd26ca52f46921e2f3c7fe5ca98b35c054c71463abd
MD5 18d97e29f3eb331f49d039d964dcddbe
BLAKE2b-256 48f6de04b7ac51edafa82c915f6f046e7a6a8bb15105fe76c75dd0acf36e3af8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page