Generate human-like password candidates using Markov chains
Project description
Markov PassGen
Generate human-like password candidates using Markov chain analysis. Build probabilistic models from text corpora to create realistic password patterns for security research, penetration testing, and password strength analysis.
🎯 Features
- Markov Chain Generation: Create passwords using n-gram probabilistic models (n=2-5)
- Multi-Corpus Support: Train models on multiple text corpora with configurable weights
- Advanced Filtering: Filter passwords by length, character sets, entropy, and custom patterns
- Text Processing: Clean and transform source text with case handling and character normalization
- Password Transformations: Apply leetspeak, case variations, and character substitutions
- Entropy Analysis: Calculate Shannon entropy and estimate password strength
- Visualization: Generate statistical plots and analyze password distributions
- CLI Interface: Powerful command-line tool with extensive options
- High Performance: Efficient n-gram building and generation with progress tracking
📦 Installation
From PyPI (when published)
pip install markov-passgen
From Source
git clone https://github.com/markmysler/markov-passgen.git
cd markov-passgen
pip install -e .
Development Installation
git clone https://github.com/markmysler/markov-passgen.git
cd markov-passgen
pip install -e ".[dev]"
🚀 Quick Start
Basic Password Generation
from markov_passgen.core import CorpusLoader, NGramBuilder, PasswordGenerator
# Load a text corpus
loader = CorpusLoader("path/to/corpus.txt")
text = loader.load()
# Build a Markov model
builder = NGramBuilder(ngram_size=3)
model = builder.build_model(text)
# Generate passwords
generator = PasswordGenerator(model, min_length=8, max_length=16)
passwords = generator.generate(count=10)
for password in passwords:
print(password)
Using the CLI
Generate 20 passwords from a corpus:
markov-passgen generate --corpus passwords.txt --count 20 --min-length 10 --max-length 16
Apply filters and transformations:
markov-passgen generate \
--corpus passwords.txt \
--count 50 \
--min-length 12 \
--require-digit \
--require-special \
--min-entropy 40 \
--transform leetspeak \
--output wordlist.txt
Use multiple corpora with weights:
markov-passgen generate \
--corpus-list common_passwords.txt \
--corpus-list english_words.txt \
--corpus-list usernames.txt \
--corpus-weights "0.5,0.3,0.2" \
--count 100
Visualize password characteristics:
markov-passgen visualize-passwords \
--wordlist generated.txt \
--entropy-dist entropy.png \
--length-dist length.png
📖 Usage Examples
Advanced Filtering
from markov_passgen.filters import (
LengthFilter,
CharacterSetFilter,
EntropyFilter,
FilterChain
)
from markov_passgen.core import EntropyCalculator
# Create filter chain
filters = FilterChain([
LengthFilter(min_length=12, max_length=20),
CharacterSetFilter(require_digit=True, require_special=True),
EntropyFilter(min_entropy=50.0, entropy_calculator=EntropyCalculator())
])
# Apply filters during generation
generator = PasswordGenerator(model, min_length=12, max_length=20)
passwords = generator.generate_filtered(count=100, filter_chain=filters)
Text Processing
from markov_passgen.transformers import TextCleaner, CaseTransformer, CharacterTransformer
# Clean and normalize text
cleaner = TextCleaner(
remove_punctuation=True,
remove_digits=False,
remove_whitespace=True,
lowercase=True
)
cleaned_text = cleaner.clean(raw_text)
# Transform case patterns
case_transformer = CaseTransformer()
titled = case_transformer.transform(text, style="title") # Title Case
camel = case_transformer.transform(text, style="camel") # camelCase
snake = case_transformer.transform(text, style="snake") # snake_case
# Character substitutions
char_transformer = CharacterTransformer()
char_transformer.add_rule("a", "@")
char_transformer.add_rule("e", "3")
transformed = char_transformer.transform("password") # p@ssword -> p@ssw0rd
Password Transformations
from markov_passgen.transformers import (
LeetSpeakTransformer,
CaseVariationTransformer,
SubstitutionTransformer,
TransformerChain
)
# Leetspeak transformation
leet = LeetSpeakTransformer(probability=0.5)
leet_password = leet.transform("password") # p@ssw0rd
# Case variations
case_var = CaseVariationTransformer()
varied = case_var.transform("password") # PaSsWoRd
# Custom substitutions
sub = SubstitutionTransformer()
sub.add_rule("a", ["@", "4"])
sub.add_rule("o", ["0"])
substituted = sub.transform("password")
# Chain multiple transformers
chain = TransformerChain([leet, case_var, sub])
result = chain.transform("password")
Multi-Corpus Analysis
from markov_passgen.core import MultiCorpusManager
# Create manager with multiple corpora
manager = MultiCorpusManager(ngram_size=3)
manager.add_corpus("common_passwords.txt", weight=0.5)
manager.add_corpus("english_words.txt", weight=0.3)
manager.add_corpus("usernames.txt", weight=0.2)
# Build merged model
merged_model = manager.build_merged_model()
# Generate passwords from merged model
generator = PasswordGenerator(merged_model, min_length=10, max_length=16)
passwords = generator.generate(count=50)
# Get corpus statistics
stats = manager.get_corpus_stats()
for stat in stats:
print(f"{stat['name']}: {stat['char_count']} chars, weight={stat['weight']}")
Visualization and Analysis
from markov_passgen.visualization import NGramVisualizer
from markov_passgen.core import EntropyCalculator
# Create visualizer
viz = NGramVisualizer(style="seaborn-v0_8-darkgrid")
# Plot n-gram frequencies
viz.plot_ngram_frequencies(model, top_n=20, output_path="ngram_freq.png")
# Plot password entropy distribution
entropy_calc = EntropyCalculator()
viz.plot_entropy_distribution(
passwords,
entropy_calc,
bins=30,
output_path="entropy_dist.png"
)
# Plot length distribution
viz.plot_length_distribution(passwords, bins=20, output_path="length_dist.png")
# Plot character frequency
viz.plot_character_distribution(text, top_n=30, output_path="char_freq.png")
# Compare multiple corpora
viz.plot_corpus_comparison(corpus_stats, output_path="corpus_comparison.png")
# Cleanup
viz.close_all()
🛠️ CLI Reference
Generate Command
markov-passgen generate [OPTIONS]
Options:
--corpus PATH: Path to text corpus file--multi-corpus: Enable multi-corpus mode--corpus-list PATH [PATH ...]: Paths to multiple corpus files (multi-corpus mode)--corpus-weights FLOAT [FLOAT ...]: Weights for each corpus (multi-corpus mode)--ngram-size INT: N-gram size (2-5, default: 3)--count INT: Number of passwords to generate (default: 10)--min-length INT: Minimum password length (default: 8)--max-length INT: Maximum password length (default: 16)--require-digit: Require at least one digit--require-lowercase: Require at least one lowercase letter--require-uppercase: Require at least one uppercase letter--require-special: Require at least one special character--min-entropy FLOAT: Minimum entropy threshold--transform CHOICE: Apply transformation (leetspeak, case_variation, substitution)--output PATH: Write passwords to file--show-entropy: Display entropy for each password
Visualize Corpus Command
markov-passgen visualize-corpus [OPTIONS]
Options:
--corpus PATH: Path to corpus file (required)--ngram-size INT: N-gram size (default: 3)--ngram-freq PATH: Output path for n-gram frequency plot--char-dist PATH: Output path for character distribution plot--top-n INT: Number of top items to display (default: 20)
Visualize Passwords Command
markov-passgen visualize-passwords [OPTIONS]
Options:
--wordlist PATH: Path to password wordlist file (required)--entropy-dist PATH: Output path for entropy distribution plot--length-dist PATH: Output path for length distribution plot--bins INT: Number of bins for histograms (default: 30)
Visualize Multi-Corpus Command
markov-passgen visualize-multi-corpus [OPTIONS]
Options:
--corpus-list PATH [PATH ...]: Paths to corpus files (required)--corpus-weights FLOAT [FLOAT ...]: Weights for each corpus--output PATH: Output path for comparison plot (required)
📊 Output Formats
Visualization plots support multiple formats:
- PNG: Raster graphics (default, 300 DPI)
- JPG: Compressed raster graphics (300 DPI)
- SVG: Vector graphics (scalable)
- PDF: Print-ready vector graphics
🧪 Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=src/markov_passgen --cov-report=html
# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
Code Quality
# Format code
black src/ tests/
# Lint code
flake8 src/ tests/
# Type checking
mypy src/
Building Distribution
# Build wheel and source distribution
python -m build
# Upload to PyPI (when ready)
twine upload dist/*
📚 Documentation
- Development Plan: Detailed project roadmap and phases
- API Reference: Complete API documentation
- Tutorials: Step-by-step guides and examples
- Architecture: System design and component overview
🔐 Security Considerations
Important: This tool is designed for security research and testing purposes only. Generated passwords should be used in controlled environments for:
- Password strength analysis
- Penetration testing with proper authorization
- Security research and academic studies
- Training machine learning models for password security
Do NOT use generated passwords for:
- Actual user accounts or systems
- Production environments
- Unauthorized access attempts
Generated passwords may contain patterns from source corpora and should not be considered cryptographically secure.
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Inspired by password cracking research and Markov chain text generation
- Built with Click for CLI
- Visualization powered by matplotlib and seaborn
📮 Contact
- GitHub: markmysler/markov-passgen
- Issues: GitHub Issues
Disclaimer: This tool is for educational and authorized security testing purposes only. Users are responsible for compliance with applicable laws and regulations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markov_passgen-1.0.0.tar.gz.
File metadata
- Download URL: markov_passgen-1.0.0.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53c14364cd34b6ef9f215394e7775159c24697a7cd27e2d18b953c89bf6ce4a5
|
|
| MD5 |
1801f2c853961c974719bb77dd7d1956
|
|
| BLAKE2b-256 |
e1b650333d46afa2f9273ec0a298dc725b9b01bfe0b4f3c8f76e04f99a5328d0
|
Provenance
The following attestation bundles were made for markov_passgen-1.0.0.tar.gz:
Publisher:
python-publish.yml on markmysler/markov-passgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markov_passgen-1.0.0.tar.gz -
Subject digest:
53c14364cd34b6ef9f215394e7775159c24697a7cd27e2d18b953c89bf6ce4a5 - Sigstore transparency entry: 731820972
- Sigstore integration time:
-
Permalink:
markmysler/markov-passgen@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/markmysler
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf -
Trigger Event:
release
-
Statement type:
File details
Details for the file markov_passgen-1.0.0-py3-none-any.whl.
File metadata
- Download URL: markov_passgen-1.0.0-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
607f2148a0c884e4fe0d4509414e95874b450ebefcb9307a824e0d3028dd9290
|
|
| MD5 |
de869713163e88d80fcf506e7706ca44
|
|
| BLAKE2b-256 |
16e9b851847985b2227120c1aee439d3e209c97581d094d34e90a6c8a256c7da
|
Provenance
The following attestation bundles were made for markov_passgen-1.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on markmysler/markov-passgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markov_passgen-1.0.0-py3-none-any.whl -
Subject digest:
607f2148a0c884e4fe0d4509414e95874b450ebefcb9307a824e0d3028dd9290 - Sigstore transparency entry: 731820974
- Sigstore integration time:
-
Permalink:
markmysler/markov-passgen@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/markmysler
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf -
Trigger Event:
release
-
Statement type: