Simple python to lazily one-hot encode fasta files using multiple processes.
Project description
Simple python to lazily one-hot encode fasta files using multiple processes, either single bases or considering arbitrary kmers.
Installation
Simply run:
pip installed fasta_one_hot_encoder
Examples
Bases
One-hot encode to bases.
from fasta_one_hot_encoder import FastaOneHotEncoder
encoder = FastaOneHotEncoder(
nucleotides = "acgt",
lower = True,
sparse = False,
handle_unknown="ignore"
)
path = "test_data/my_test_fasta.fa"
encoder.transform_to_df(path, verbose=True).to_csv(
"my_result.csv"
)
Obtained results should look like:
a |
c |
g |
t |
|
---|---|---|---|---|
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
2 |
0 |
1 |
0 |
0 |
Kmers
One-hot encode to kmers of given length.
from fasta_one_hot_encoder import FastaOneHotEncoder
encoder = FastaOneHotEncoder(
nucleotides = "acgt",
kmers_length=2,
lower = True,
sparse = False,
handle_unknown="ignore"
)
path = "test_data/my_test_fasta.fa"
encoder.transform_to_df(path, verbose=True).to_csv(
"my_result.csv"
)
Obtained results should look like:
aa |
ac |
ag |
at |
ca |
cc |
cg |
ct |
ga |
gc |
gg |
gt |
ta |
tc |
tg |
tt |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for fasta_one_hot_encoder-1.2.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 546bd1c5347d1c1e969665c6d2191c9f874e0a88d9d8fbf5147ea27197923d53 |
|
MD5 | 7d8bda57f375fa1305c195821f71d94a |
|
BLAKE2b-256 | a0eff1066ade9f807f5d83db3fc4e7a5dba959209f2718e373f64c45d37d813a |