No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

GeneVecTools

Reading in Variety of Genetic File Types

Vector Embedding Algorithms

Byte Array Encoders

Clustering and Preprocessing Steps for Compression

Similarity Search Tools for FASTA/FASTQ files

Installing

Tester files: https://tinyurl.com/cDNALibraryExampleFiles

.. code-block:: bash

pip install GeneVecTools

Usage

.. code-block:: bash

>>> from GeneVecTools import simSearch
>>> from GeneVecTools import reader
>>> from GeneVecTools import mapper
>>> from GeneVecTools import encoder

.. code-block:: bash

"""
file is location of the "small_cDNA_Sequences_pbmc_1k_v2_S1_L002_R2_001.fastq" 
that you downloaded from https://tinyurl.com/cDNALibraryExampleFiles
if it is in current directory, just use file name
"""
>>> file = "small_cDNA_Sequences_pbmc_1k_v2_S1_L002_R2_001.fastq"

.. code-block:: bash

"""
f is the file location and name
length is the number of sequences we want in our scope
encoding is one of three choices: "one-hot-encoding", "standard", or "no-encoding"
bits is one of three choices: 2, 4, or 8
"""
>>> VECSS = simSearch.VecSS(f=file, length=10000, encoding="one-hot-encoding",bits=8)
>>> sequences = VECSS.readq()

.. code-block:: bash

# The function "embed" produces the vector embedding of the sequence
>>> embedded = VECSS.embed(VECSS.s)
>>> print(embedded)

.. code-block:: bash

"""
similarity search
I are the indices of the similar sequences
D are how different the similar sequences are from the query sequence
time is the time it takes to perform this similarity search query
"""
>>> D, I, time = VECSS.run_search()
>>> print(D,I,time)

.. code-block:: bash

# Testing the embedding and umembedding process
>>> assert VECSS.unembed(VECSS.embed(VECSS.s)) == VECSS.s

.. code-block:: bash

# Extracting sequences
>>> R = reader.Reader()
>>> mp, count, total_len, quality = R.read_fastq(dir)
>>> sequences_dict_items = mp.values()
>>> sequences = list(sequences_dict_items)
>>> print(sequences)

.. code-block:: bash

# Clustering
>>> mapObj = mapper.Mapper(sequences, 2, 3)
>>> groups_of_similar_kmers = mapper.cluster(mapObj.hfs)
>>> cluster_of_sequences = mapper.groupings(groups_of_similar_kmers, sequences)
>>> print(cluster_of_sequences)

.. code-block:: bash

# Encoding
>>> encoder =encoder. Encoder(4)
>>> c = encoder.encode_sequences(sequences)
>>> print(c)

.. code-block:: bash

# Compress
>>> encoded_clusters_compressed = encoder.encode_clusters(cluster_of_sequences)
>>> print(encoded_clusters_compressed)

.. code-block:: bash

# Decompress
>>> decoded_clusters_compressed = encoder.decode_clusters(encoded_clusters_compressed)
>>> print(decoded_clusters_compressed)

.. code-block:: bash

# Testing the compressing and decompressing process
>>> assert cluster_of_sequences == decoded_clusters_compressed

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.44

May 16, 2022

1.43

May 13, 2022

1.42

May 13, 2022

1.41

May 13, 2022

1.40

May 13, 2022

1.39

May 13, 2022

1.38

May 13, 2022

1.37

May 13, 2022

1.36

May 13, 2022

1.35

May 13, 2022

1.34

May 12, 2022

1.33

May 12, 2022

1.32

May 12, 2022

1.31

May 12, 2022

1.29

May 12, 2022

1.28

May 12, 2022

1.27

May 12, 2022

1.26

May 12, 2022

1.25

May 12, 2022

1.24

May 12, 2022

1.23

May 12, 2022

1.22

May 11, 2022

1.21

Apr 28, 2022

1.19

Apr 28, 2022

1.18

Apr 28, 2022

1.17

Apr 28, 2022

1.16

Apr 28, 2022

1.12

Apr 27, 2022

1.11

Apr 27, 2022

1.10

Apr 27, 2022

1.9

Apr 27, 2022

1.8

Apr 27, 2022

1.7

Apr 27, 2022

1.6

Apr 27, 2022

1.3

May 12, 2022

1.2

Apr 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GeneVecTools-1.44.tar.gz (11.1 kB view details)

Uploaded May 16, 2022 Source

File details

Details for the file GeneVecTools-1.44.tar.gz.

File metadata

Download URL: GeneVecTools-1.44.tar.gz
Upload date: May 16, 2022
Size: 11.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.2

File hashes

Hashes for GeneVecTools-1.44.tar.gz
Algorithm	Hash digest
SHA256	`04e332e66cdf5d2ebe59e6a824980f793df86580e4768a35404779fa8d7d2b47`
MD5	`b5fad1813e1cf6f305371c6ccc38e25e`
BLAKE2b-256	`8c55251bc030281f273985fe5592e9ee01e13abba5c5d7a40c16e47189a4ba2e`

See more details on using hashes here.

GeneVecTools 1.44

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

GeneVecTools

Tester files: https://tinyurl.com/cDNALibraryExampleFiles

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes