Skip to main content

Generate human-like password candidates using Markov chains

Project description

Markov PassGen

Python Version License: MIT Code Coverage

Generate human-like password candidates using Markov chain analysis. Build probabilistic models from text corpora to create realistic password patterns for security research, penetration testing, and password strength analysis.

🎯 Features

  • Markov Chain Generation: Create passwords using n-gram probabilistic models (n=2-5)
  • Multi-Corpus Support: Train models on multiple text corpora with configurable weights
  • Advanced Filtering: Filter passwords by length, character sets, entropy, and custom patterns
  • Text Processing: Clean and transform source text with case handling and character normalization
  • Password Transformations: Apply leetspeak, case variations, and character substitutions
  • Entropy Analysis: Calculate Shannon entropy and estimate password strength
  • Visualization: Generate statistical plots and analyze password distributions
  • CLI Interface: Powerful command-line tool with extensive options
  • High Performance: Efficient n-gram building and generation with progress tracking

📦 Installation

From PyPI (when published)

pip install markov-passgen

From Source

git clone https://github.com/markmysler/markov-passgen.git
cd markov-passgen
pip install -e .

Development Installation

git clone https://github.com/markmysler/markov-passgen.git
cd markov-passgen
pip install -e ".[dev]"

🚀 Quick Start

Basic Password Generation

from markov_passgen.core import CorpusLoader, NGramBuilder, PasswordGenerator

# Load a text corpus
loader = CorpusLoader("path/to/corpus.txt")
text = loader.load()

# Build a Markov model
builder = NGramBuilder(ngram_size=3)
model = builder.build_model(text)

# Generate passwords
generator = PasswordGenerator(model, min_length=8, max_length=16)
passwords = generator.generate(count=10)

for password in passwords:
    print(password)

Using the CLI

Generate 20 passwords from a corpus:

markov-passgen generate --corpus passwords.txt --count 20 --min-length 10 --max-length 16

Apply filters and transformations:

markov-passgen generate \
    --corpus passwords.txt \
    --count 50 \
    --min-length 12 \
    --require-digit \
    --require-special \
    --min-entropy 40 \
    --transform leetspeak \
    --output wordlist.txt

Use multiple corpora with weights:

markov-passgen generate \
    --corpus-list common_passwords.txt \
    --corpus-list english_words.txt \
    --corpus-list usernames.txt \
    --corpus-weights "0.5,0.3,0.2" \
    --count 100

Visualize password characteristics:

markov-passgen visualize-passwords \
    --wordlist generated.txt \
    --entropy-dist entropy.png \
    --length-dist length.png

📖 Usage Examples

Advanced Filtering

from markov_passgen.filters import (
    LengthFilter,
    CharacterSetFilter,
    EntropyFilter,
    FilterChain
)
from markov_passgen.core import EntropyCalculator

# Create filter chain
filters = FilterChain([
    LengthFilter(min_length=12, max_length=20),
    CharacterSetFilter(require_digit=True, require_special=True),
    EntropyFilter(min_entropy=50.0, entropy_calculator=EntropyCalculator())
])

# Apply filters during generation
generator = PasswordGenerator(model, min_length=12, max_length=20)
passwords = generator.generate_filtered(count=100, filter_chain=filters)

Text Processing

from markov_passgen.transformers import TextCleaner, CaseTransformer, CharacterTransformer

# Clean and normalize text
cleaner = TextCleaner(
    remove_punctuation=True,
    remove_digits=False,
    remove_whitespace=True,
    lowercase=True
)
cleaned_text = cleaner.clean(raw_text)

# Transform case patterns
case_transformer = CaseTransformer()
titled = case_transformer.transform(text, style="title")  # Title Case
camel = case_transformer.transform(text, style="camel")   # camelCase
snake = case_transformer.transform(text, style="snake")   # snake_case

# Character substitutions
char_transformer = CharacterTransformer()
char_transformer.add_rule("a", "@")
char_transformer.add_rule("e", "3")
transformed = char_transformer.transform("password")  # p@ssword -> p@ssw0rd

Password Transformations

from markov_passgen.transformers import (
    LeetSpeakTransformer,
    CaseVariationTransformer,
    SubstitutionTransformer,
    TransformerChain
)

# Leetspeak transformation
leet = LeetSpeakTransformer(probability=0.5)
leet_password = leet.transform("password")  # p@ssw0rd

# Case variations
case_var = CaseVariationTransformer()
varied = case_var.transform("password")  # PaSsWoRd

# Custom substitutions
sub = SubstitutionTransformer()
sub.add_rule("a", ["@", "4"])
sub.add_rule("o", ["0"])
substituted = sub.transform("password")

# Chain multiple transformers
chain = TransformerChain([leet, case_var, sub])
result = chain.transform("password")

Multi-Corpus Analysis

from markov_passgen.core import MultiCorpusManager

# Create manager with multiple corpora
manager = MultiCorpusManager(ngram_size=3)
manager.add_corpus("common_passwords.txt", weight=0.5)
manager.add_corpus("english_words.txt", weight=0.3)
manager.add_corpus("usernames.txt", weight=0.2)

# Build merged model
merged_model = manager.build_merged_model()

# Generate passwords from merged model
generator = PasswordGenerator(merged_model, min_length=10, max_length=16)
passwords = generator.generate(count=50)

# Get corpus statistics
stats = manager.get_corpus_stats()
for stat in stats:
    print(f"{stat['name']}: {stat['char_count']} chars, weight={stat['weight']}")

Visualization and Analysis

from markov_passgen.visualization import NGramVisualizer
from markov_passgen.core import EntropyCalculator

# Create visualizer
viz = NGramVisualizer(style="seaborn-v0_8-darkgrid")

# Plot n-gram frequencies
viz.plot_ngram_frequencies(model, top_n=20, output_path="ngram_freq.png")

# Plot password entropy distribution
entropy_calc = EntropyCalculator()
viz.plot_entropy_distribution(
    passwords,
    entropy_calc,
    bins=30,
    output_path="entropy_dist.png"
)

# Plot length distribution
viz.plot_length_distribution(passwords, bins=20, output_path="length_dist.png")

# Plot character frequency
viz.plot_character_distribution(text, top_n=30, output_path="char_freq.png")

# Compare multiple corpora
viz.plot_corpus_comparison(corpus_stats, output_path="corpus_comparison.png")

# Cleanup
viz.close_all()

🛠️ CLI Reference

Generate Command

markov-passgen generate [OPTIONS]

Options:

  • --corpus PATH: Path to text corpus file
  • --multi-corpus: Enable multi-corpus mode
  • --corpus-list PATH [PATH ...]: Paths to multiple corpus files (multi-corpus mode)
  • --corpus-weights FLOAT [FLOAT ...]: Weights for each corpus (multi-corpus mode)
  • --ngram-size INT: N-gram size (2-5, default: 3)
  • --count INT: Number of passwords to generate (default: 10)
  • --min-length INT: Minimum password length (default: 8)
  • --max-length INT: Maximum password length (default: 16)
  • --require-digit: Require at least one digit
  • --require-lowercase: Require at least one lowercase letter
  • --require-uppercase: Require at least one uppercase letter
  • --require-special: Require at least one special character
  • --min-entropy FLOAT: Minimum entropy threshold
  • --transform CHOICE: Apply transformation (leetspeak, case_variation, substitution)
  • --output PATH: Write passwords to file
  • --show-entropy: Display entropy for each password

Visualize Corpus Command

markov-passgen visualize-corpus [OPTIONS]

Options:

  • --corpus PATH: Path to corpus file (required)
  • --ngram-size INT: N-gram size (default: 3)
  • --ngram-freq PATH: Output path for n-gram frequency plot
  • --char-dist PATH: Output path for character distribution plot
  • --top-n INT: Number of top items to display (default: 20)

Visualize Passwords Command

markov-passgen visualize-passwords [OPTIONS]

Options:

  • --wordlist PATH: Path to password wordlist file (required)
  • --entropy-dist PATH: Output path for entropy distribution plot
  • --length-dist PATH: Output path for length distribution plot
  • --bins INT: Number of bins for histograms (default: 30)

Visualize Multi-Corpus Command

markov-passgen visualize-multi-corpus [OPTIONS]

Options:

  • --corpus-list PATH [PATH ...]: Paths to corpus files (required)
  • --corpus-weights FLOAT [FLOAT ...]: Weights for each corpus
  • --output PATH: Output path for comparison plot (required)

📊 Output Formats

Visualization plots support multiple formats:

  • PNG: Raster graphics (default, 300 DPI)
  • JPG: Compressed raster graphics (300 DPI)
  • SVG: Vector graphics (scalable)
  • PDF: Print-ready vector graphics

🧪 Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/markov_passgen --cov-report=html

# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/

Code Quality

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Building Distribution

# Build wheel and source distribution
python -m build

# Upload to PyPI (when ready)
twine upload dist/*

📚 Documentation

🔐 Security Considerations

Important: This tool is designed for security research and testing purposes only. Generated passwords should be used in controlled environments for:

  • Password strength analysis
  • Penetration testing with proper authorization
  • Security research and academic studies
  • Training machine learning models for password security

Do NOT use generated passwords for:

  • Actual user accounts or systems
  • Production environments
  • Unauthorized access attempts

Generated passwords may contain patterns from source corpora and should not be considered cryptographically secure.

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Inspired by password cracking research and Markov chain text generation
  • Built with Click for CLI
  • Visualization powered by matplotlib and seaborn

📮 Contact


Disclaimer: This tool is for educational and authorized security testing purposes only. Users are responsible for compliance with applicable laws and regulations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markov_passgen-1.0.0.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markov_passgen-1.0.0-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file markov_passgen-1.0.0.tar.gz.

File metadata

  • Download URL: markov_passgen-1.0.0.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markov_passgen-1.0.0.tar.gz
Algorithm Hash digest
SHA256 53c14364cd34b6ef9f215394e7775159c24697a7cd27e2d18b953c89bf6ce4a5
MD5 1801f2c853961c974719bb77dd7d1956
BLAKE2b-256 e1b650333d46afa2f9273ec0a298dc725b9b01bfe0b4f3c8f76e04f99a5328d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for markov_passgen-1.0.0.tar.gz:

Publisher: python-publish.yml on markmysler/markov-passgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markov_passgen-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: markov_passgen-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markov_passgen-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 607f2148a0c884e4fe0d4509414e95874b450ebefcb9307a824e0d3028dd9290
MD5 de869713163e88d80fcf506e7706ca44
BLAKE2b-256 16e9b851847985b2227120c1aee439d3e209c97581d094d34e90a6c8a256c7da

See more details on using hashes here.

Provenance

The following attestation bundles were made for markov_passgen-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on markmysler/markov-passgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page