Simple python to lazily one-hot encode fasta files using multiple processes.
Project description
Simple python to lazily one-hot encode fasta files using multiple processes, either single bases or considering arbitrary kmers.
Installation
Simply run:
pip installed fasta_one_hot_encoder
Examples
Bases
One-hot encode to bases.
from fasta_one_hot_encoder import FastaOneHotEncoder
encoder = FastaOneHotEncoder(
nucleotides = "acgt",
lower = True,
sparse = False,
handle_unknown="ignore"
)
path = "test_data/my_test_fasta.fa"
encoder.transform_to_df(path, verbose=True).to_csv(
"my_result.csv"
)
Obtained results should look like:
a |
c |
g |
t |
|
---|---|---|---|---|
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
2 |
0 |
1 |
0 |
0 |
Kmers
One-hot encode to kmers of given length.
from fasta_one_hot_encoder import FastaOneHotEncoder
encoder = FastaOneHotEncoder(
nucleotides = "acgt",
kmers_length=2,
lower = True,
sparse = False,
handle_unknown="ignore"
)
path = "test_data/my_test_fasta.fa"
encoder.transform_to_df(path, verbose=True).to_csv(
"my_result.csv"
)
Obtained results should look like:
aa |
ac |
ag |
at |
ca |
cc |
cg |
ct |
ga |
gc |
gg |
gt |
ta |
tc |
tg |
tt |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for fasta_one_hot_encoder-1.2.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23ef48ab39d93a597a0a3f39efd31e326c87ba1fa896bc221f6249c55d9a6ceb |
|
MD5 | f71b4e11282ca792727d82b4e0c28c13 |
|
BLAKE2b-256 | 77c6e79e6857e8c106dd52257b0136ce1672dd2a622a02b704c7b024ed5a5a07 |