Generate human-like password candidates using Markov chains

These details have not been verified by PyPI

Project description

Markov PassGen

Generate human-like password candidates using Markov chain analysis. Build probabilistic models from text corpora to create realistic password patterns for security research, penetration testing, and password strength analysis.

🎯 Features

Markov Chain Generation: Create passwords using n-gram probabilistic models (n=2-5)
Multi-Corpus Support: Train models on multiple text corpora with configurable weights
Advanced Filtering: Filter passwords by length, character sets, entropy, and custom patterns
Text Processing: Clean and transform source text with case handling and character normalization
Password Transformations: Apply leetspeak, case variations, and character substitutions
Entropy Analysis: Calculate Shannon entropy and estimate password strength
Visualization: Generate statistical plots and analyze password distributions
CLI Interface: Powerful command-line tool with extensive options
High Performance: Efficient n-gram building and generation with progress tracking

📦 Installation

From PyPI (when published)

pip install markov-passgen

From Source

git clone https://github.com/markmysler/markov-passgen.git
cd markov-passgen
pip install -e .

Development Installation

git clone https://github.com/markmysler/markov-passgen.git
cd markov-passgen
pip install -e ".[dev]"

🚀 Quick Start

Basic Password Generation

from markov_passgen.core import CorpusLoader, NGramBuilder, PasswordGenerator

# Load a text corpus
loader = CorpusLoader("path/to/corpus.txt")
text = loader.load()

# Build a Markov model
builder = NGramBuilder(ngram_size=3)
model = builder.build_model(text)

# Generate passwords
generator = PasswordGenerator(model, min_length=8, max_length=16)
passwords = generator.generate(count=10)

for password in passwords:
    print(password)

Using the CLI

Generate 20 passwords from a corpus:

markov-passgen generate --corpus passwords.txt --count 20 --min-length 10 --max-length 16

Apply filters and transformations:

markov-passgen generate \
    --corpus passwords.txt \
    --count 50 \
    --min-length 12 \
    --require-digit \
    --require-special \
    --min-entropy 40 \
    --transform leetspeak \
    --output wordlist.txt

Use multiple corpora with weights:

markov-passgen generate \
    --corpus-list common_passwords.txt \
    --corpus-list english_words.txt \
    --corpus-list usernames.txt \
    --corpus-weights "0.5,0.3,0.2" \
    --count 100

Visualize password characteristics:

markov-passgen visualize-passwords \
    --wordlist generated.txt \
    --entropy-dist entropy.png \
    --length-dist length.png

📖 Usage Examples

Advanced Filtering

from markov_passgen.filters import (
    LengthFilter,
    CharacterSetFilter,
    EntropyFilter,
    FilterChain
)
from markov_passgen.core import EntropyCalculator

# Create filter chain
filters = FilterChain([
    LengthFilter(min_length=12, max_length=20),
    CharacterSetFilter(require_digit=True, require_special=True),
    EntropyFilter(min_entropy=50.0, entropy_calculator=EntropyCalculator())
])

# Apply filters during generation
generator = PasswordGenerator(model, min_length=12, max_length=20)
passwords = generator.generate_filtered(count=100, filter_chain=filters)

Text Processing

from markov_passgen.transformers import TextCleaner, CaseTransformer, CharacterTransformer

# Clean and normalize text
cleaner = TextCleaner(
    remove_punctuation=True,
    remove_digits=False,
    remove_whitespace=True,
    lowercase=True
)
cleaned_text = cleaner.clean(raw_text)

# Transform case patterns
case_transformer = CaseTransformer()
titled = case_transformer.transform(text, style="title")  # Title Case
camel = case_transformer.transform(text, style="camel")   # camelCase
snake = case_transformer.transform(text, style="snake")   # snake_case

# Character substitutions
char_transformer = CharacterTransformer()
char_transformer.add_rule("a", "@")
char_transformer.add_rule("e", "3")
transformed = char_transformer.transform("password")  # p@ssword -> p@ssw0rd

Password Transformations

from markov_passgen.transformers import (
    LeetSpeakTransformer,
    CaseVariationTransformer,
    SubstitutionTransformer,
    TransformerChain
)

# Leetspeak transformation
leet = LeetSpeakTransformer(probability=0.5)
leet_password = leet.transform("password")  # p@ssw0rd

# Case variations
case_var = CaseVariationTransformer()
varied = case_var.transform("password")  # PaSsWoRd

# Custom substitutions
sub = SubstitutionTransformer()
sub.add_rule("a", ["@", "4"])
sub.add_rule("o", ["0"])
substituted = sub.transform("password")

# Chain multiple transformers
chain = TransformerChain([leet, case_var, sub])
result = chain.transform("password")

Multi-Corpus Analysis

from markov_passgen.core import MultiCorpusManager

# Create manager with multiple corpora
manager = MultiCorpusManager(ngram_size=3)
manager.add_corpus("common_passwords.txt", weight=0.5)
manager.add_corpus("english_words.txt", weight=0.3)
manager.add_corpus("usernames.txt", weight=0.2)

# Build merged model
merged_model = manager.build_merged_model()

# Generate passwords from merged model
generator = PasswordGenerator(merged_model, min_length=10, max_length=16)
passwords = generator.generate(count=50)

# Get corpus statistics
stats = manager.get_corpus_stats()
for stat in stats:
    print(f"{stat['name']}: {stat['char_count']} chars, weight={stat['weight']}")

Visualization and Analysis

from markov_passgen.visualization import NGramVisualizer
from markov_passgen.core import EntropyCalculator

# Create visualizer
viz = NGramVisualizer(style="seaborn-v0_8-darkgrid")

# Plot n-gram frequencies
viz.plot_ngram_frequencies(model, top_n=20, output_path="ngram_freq.png")

# Plot password entropy distribution
entropy_calc = EntropyCalculator()
viz.plot_entropy_distribution(
    passwords,
    entropy_calc,
    bins=30,
    output_path="entropy_dist.png"
)

# Plot length distribution
viz.plot_length_distribution(passwords, bins=20, output_path="length_dist.png")

# Plot character frequency
viz.plot_character_distribution(text, top_n=30, output_path="char_freq.png")

# Compare multiple corpora
viz.plot_corpus_comparison(corpus_stats, output_path="corpus_comparison.png")

# Cleanup
viz.close_all()

🛠️ CLI Reference

Generate Command

markov-passgen generate [OPTIONS]

Options:

--corpus PATH: Path to text corpus file
--multi-corpus: Enable multi-corpus mode
--corpus-list PATH [PATH ...]: Paths to multiple corpus files (multi-corpus mode)
--corpus-weights FLOAT [FLOAT ...]: Weights for each corpus (multi-corpus mode)
--ngram-size INT: N-gram size (2-5, default: 3)
--count INT: Number of passwords to generate (default: 10)
--min-length INT: Minimum password length (default: 8)
--max-length INT: Maximum password length (default: 16)
--require-digit: Require at least one digit
--require-lowercase: Require at least one lowercase letter
--require-uppercase: Require at least one uppercase letter
--require-special: Require at least one special character
--min-entropy FLOAT: Minimum entropy threshold
--transform CHOICE: Apply transformation (leetspeak, case_variation, substitution)
--output PATH: Write passwords to file
--show-entropy: Display entropy for each password

Visualize Corpus Command

markov-passgen visualize-corpus [OPTIONS]

Options:

--corpus PATH: Path to corpus file (required)
--ngram-size INT: N-gram size (default: 3)
--ngram-freq PATH: Output path for n-gram frequency plot
--char-dist PATH: Output path for character distribution plot
--top-n INT: Number of top items to display (default: 20)

Visualize Passwords Command

markov-passgen visualize-passwords [OPTIONS]

Options:

--wordlist PATH: Path to password wordlist file (required)
--entropy-dist PATH: Output path for entropy distribution plot
--length-dist PATH: Output path for length distribution plot
--bins INT: Number of bins for histograms (default: 30)

Visualize Multi-Corpus Command

markov-passgen visualize-multi-corpus [OPTIONS]

Options:

--corpus-list PATH [PATH ...]: Paths to corpus files (required)
--corpus-weights FLOAT [FLOAT ...]: Weights for each corpus
--output PATH: Output path for comparison plot (required)

📊 Output Formats

Visualization plots support multiple formats:

PNG: Raster graphics (default, 300 DPI)
JPG: Compressed raster graphics (300 DPI)
SVG: Vector graphics (scalable)
PDF: Print-ready vector graphics

🧪 Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/markov_passgen --cov-report=html

# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/

Code Quality

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Building Distribution

# Build wheel and source distribution
python -m build

# Upload to PyPI (when ready)
twine upload dist/*

📚 Documentation

Development Plan: Detailed project roadmap and phases
API Reference: Complete API documentation
Tutorials: Step-by-step guides and examples
Architecture: System design and component overview

🔐 Security Considerations

Important: This tool is designed for security research and testing purposes only. Generated passwords should be used in controlled environments for:

Password strength analysis
Penetration testing with proper authorization
Security research and academic studies
Training machine learning models for password security

Do NOT use generated passwords for:

Actual user accounts or systems
Production environments
Unauthorized access attempts

Generated passwords may contain patterns from source corpora and should not be considered cryptographically secure.

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by password cracking research and Markov chain text generation
Built with Click for CLI
Visualization powered by matplotlib and seaborn

📮 Contact

GitHub: markmysler/markov-passgen
Issues: GitHub Issues

Disclaimer: This tool is for educational and authorized security testing purposes only. Users are responsible for compliance with applicable laws and regulations.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Nov 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markov_passgen-1.0.0.tar.gz (27.4 kB view details)

Uploaded Nov 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

markov_passgen-1.0.0-py3-none-any.whl (30.7 kB view details)

Uploaded Nov 30, 2025 Python 3

File details

Details for the file markov_passgen-1.0.0.tar.gz.

File metadata

Download URL: markov_passgen-1.0.0.tar.gz
Upload date: Nov 30, 2025
Size: 27.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markov_passgen-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`53c14364cd34b6ef9f215394e7775159c24697a7cd27e2d18b953c89bf6ce4a5`
MD5	`1801f2c853961c974719bb77dd7d1956`
BLAKE2b-256	`e1b650333d46afa2f9273ec0a298dc725b9b01bfe0b4f3c8f76e04f99a5328d0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for markov_passgen-1.0.0.tar.gz:

Publisher: python-publish.yml on markmysler/markov-passgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: markov_passgen-1.0.0.tar.gz
- Subject digest: 53c14364cd34b6ef9f215394e7775159c24697a7cd27e2d18b953c89bf6ce4a5
- Sigstore transparency entry: 731820972
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: markmysler/markov-passgen@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf
- Branch / Tag: refs/tags/1.0.0
- Owner: https://github.com/markmysler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf
- Trigger Event: release

File details

Details for the file markov_passgen-1.0.0-py3-none-any.whl.

File metadata

Download URL: markov_passgen-1.0.0-py3-none-any.whl
Upload date: Nov 30, 2025
Size: 30.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markov_passgen-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`607f2148a0c884e4fe0d4509414e95874b450ebefcb9307a824e0d3028dd9290`
MD5	`de869713163e88d80fcf506e7706ca44`
BLAKE2b-256	`16e9b851847985b2227120c1aee439d3e209c97581d094d34e90a6c8a256c7da`

See more details on using hashes here.

Provenance

The following attestation bundles were made for markov_passgen-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on markmysler/markov-passgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: markov_passgen-1.0.0-py3-none-any.whl
- Subject digest: 607f2148a0c884e4fe0d4509414e95874b450ebefcb9307a824e0d3028dd9290
- Sigstore transparency entry: 731820974
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: markmysler/markov-passgen@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf
- Branch / Tag: refs/tags/1.0.0
- Owner: https://github.com/markmysler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@25d6a2e5c334fa229f354eb74a0ad688c3b1cfaf
- Trigger Event: release

markov-passgen 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Markov PassGen

🎯 Features

📦 Installation

From PyPI (when published)

From Source

Development Installation

🚀 Quick Start

Basic Password Generation

Using the CLI

📖 Usage Examples

Advanced Filtering

Text Processing

Password Transformations

Multi-Corpus Analysis

Visualization and Analysis

🛠️ CLI Reference

Generate Command

Visualize Corpus Command

Visualize Passwords Command

Visualize Multi-Corpus Command

📊 Output Formats

🧪 Development

Running Tests

Code Quality

Building Distribution

📚 Documentation

🔐 Security Considerations

🤝 Contributing

📄 License

🙏 Acknowledgments

📮 Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance