Using Markov chain to detect gibberish in text.
Project description
pygibberish
pygibberish is a Python-based application designed to analyze and identify gibberish in a given string. The application leverages the principles of Markov Chains, a mathematical system that undergoes transitions from one state to other on a state space, to calculate both additive and multiplicative probabilities. pygibberish allow users to build their own model with custom txt file.
Usage Examples
Build Model
from pygibberish.scanner import GibberishScanner
if __name__ == "__main__":
scanner = GibberishScanner()
scanner.build_model(corpus_path="path/to//corpus.txt", n_gram_size=2)
scanner.save_model("transition_matrix_2d.tm", encoding="utf-8")
Scan Gibberish
from pygibberish.scanner import GibberishScanner
if __name__ == "__main__":
scanner = GibberishScanner()
scanner.load_model(path="transition_matrix_2d.tm")
additive_cum_proba, multiplicative_cum_proba = scanner.scan("ldfjgnkdfjnd")
print(additive_cum_proba)
print(multiplicative_cum_proba)
# 0.00022810218978102192
# 0.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gibberishpy-1.0.0.tar.gz
(5.0 kB
view hashes)
Built Distribution
Close
Hashes for gibberishpy-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7a5619f8046ee06c528072c7ca5d6d11d23d82d3773693a184a2dfb79f23f12 |
|
MD5 | 2737bb88316eab1c8c540c1224ade7c5 |
|
BLAKE2b-256 | 45b8932304b9ea4a6cbeb48ed76a4d6f684eadf09ed7b3a1ecef777a4f0e79e2 |