No project description provided
Project description
GeneVecTools
Reading in Variety of Genetic File Types
Vector Embedding Algorithms
Byte Array Encoders
Clustering and Preprocessing Steps for Compression
Similarity Search Tools for FASTA/FASTQ files
Installing
Tester files: https://tinyurl.com/cDNALibraryExampleFiles
.. code-block:: bash
pip install GeneVecTools
Usage
.. code-block:: bash
>>> from GeneVecTools import simSearch
>>> from GeneVecTools import reader
>>> from GeneVecTools import mapper
>>> from GeneVecTools import encoder
.. code-block:: bash
"""
file is location of the "small_cDNA_Sequences_pbmc_1k_v2_S1_L002_R2_001.fastq"
that you downloaded from https://tinyurl.com/cDNALibraryExampleFiles
if it is in current directory, just use file name
"""
>>> file = "small_cDNA_Sequences_pbmc_1k_v2_S1_L002_R2_001.fastq"
.. code-block:: bash
"""
f is the file location and name
length is the number of sequences we want in our scope
encoding is one of three choices: "one-hot-encoding", "standard", or "no-encoding"
bits is one of three choices: 2, 4, or 8
"""
>>> VECSS = simSearch.VecSS(f=dir, length=10000, encoding="one-hot-encoding",bits=8)
>>> sequences = VECSS.readq()
.. code-block:: bash
# The function "embed" produces the vector embedding of the sequence
>>> embedded = VECSS.embed(VECSS.s)
>>> print(embedded)
.. code-block:: bash
"""
similarity search
I are the indices of the similar sequences
D are how different the similar sequences are from the query sequence
time is the time it takes to perform this similarity search query
"""
>>> D, I, time = VECSS.run_search()
>>> print(D,I,time)
.. code-block:: bash
# Testing the embedding and umembedding process
>>> print(VECSS.unembed(VECSS.embed(VECSS.s)) == VECSS.s)
'True'
.. code-block:: bash
# Extracting sequences
>>> R = reader.Reader()
>>> mp, count, total_len, quality = R.read_fastq(dir)
>>> sequences_dict_items = mp.values()
>>> sequences = list(sequences_dict_items)
>>> print(sequences)
.. code-block:: bash
# Clustering
>>> mapObj = mapper.Mapper(sequences, 2, 3)
>>> groups_of_similar_kmers = mapper.cluster(mapObj.hfs)
>>> cluster_of_sequences = mapper.groupings(groups_of_similar_kmers, sequences)
>>> print(cluster_of_sequences)
.. code-block:: bash
# Encoding
>>> encoder =encoder. Encoder(4)
>>> c = encoder.encode_sequences(sequences)
>>> print(c)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
GeneVecTools-1.43.tar.gz
(10.9 kB
view details)
File details
Details for the file GeneVecTools-1.43.tar.gz.
File metadata
- Download URL: GeneVecTools-1.43.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e112068fd9de1bb93bca53f417a57cfb93693da2b531f13f02aaca6a32fbcc9
|
|
| MD5 |
c22d858530ea8ec2d7c0db475829a595
|
|
| BLAKE2b-256 |
24a5ba1827b7b88b45a5ec03620f07637addee424b8df03b761a378317408a01
|