Skip to main content

High-performance email generator using Markov chains and Bloom filters

Project description

mailgen

High-performance email generator using Markov chains and Bloom filters.

Build Status License: MIT

Features

  • 🚀 High Performance - Generate 250K+ emails per second (Fast Mode)
  • 🎯 Realistic Names - Markov chain-based name generation
  • Uniqueness Guaranteed - Bloom filter for efficient duplicate detection
  • 📝 Custom Wordlists - Support for custom name and domain lists
  • 🔧 Configurable - Multiple email patterns and generation options
  • 💾 Memory Efficient - ~1.2 MB for 1 million unique emails

Installation

Binary Installation (Recommended)

Quickly install the latest binary for your system (Linux, macOS, or Windows):

# Linux/macOS (using install script)
curl -fsSL https://raw.githubusercontent.com/akin01/emailgen/main/install.sh | sudo bash

# Windows PowerShell (one-liner, no file download needed)
powershell -ExecutionPolicy Bypass -Command "iwr -useb https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 | iex"

# Alternative PowerShell syntax
Invoke-WebRequest -Uri https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 -UseBasicParsing | Invoke-Expression

Build from Source

Quick Start

Generate Emails

# Generate 1000 emails to stdout
./target/release/mailgen --count 1000

# Generate 1 million emails to file (Fast Mode)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Use custom wordlists
./target/release/mailgen --count 10000 \
    --names data/example_names.txt \
    --domains data/example_domains.txt \
    --output emails.txt

As a Library

Add to your Cargo.toml:

[dependencies]
emailgen = { git = "https://github.com/akin01/emailgen" }
use mailgen::EmailGenerator;

fn main() {
    // Basic usage
    let mut generator = EmailGenerator::new();
    let email = generator.generate();
    println!("Generated: {}", email);

    // Generate many emails
    let emails = generator.generate_many(1000);

    // With custom wordlists
    let names = vec!["John Doe".to_string(), "Jane Smith".to_string()];
    let domains = vec!["example.com".to_string()];
    let mut generator = EmailGenerator::with_names_and_domains(names, domains);
    let emails = generator.generate_many(10000);
}

Performance

Generation Speed (Actual Benchmarks)

Mode 10K 100K 1M
Fast Mode (--fast) 0.04s 0.38s 7.5s
Default Mode 3.9s 39s ~6.5 min

💡 Tip: Use --fast mode for bulk generation (>10K emails) for best performance.

Memory Usage

  • ~1.2 MB for 1 million unique emails (Bloom filter)

Usage

# Fast mode for bulk generation (~250K emails/sec)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Default mode with 30% Markov for variety (~2.6K emails/sec)
./target/release/mailgen --count 100000 --output emails.txt

# Generate to stdout
./target/release/mailgen --count 1000 --fast

See PERFORMANCE.md for detailed benchmarks.

Usage

Direct Command Line

After installing via the script, uv, or npm, the mailgen command is available directly in your terminal:

# Basic usage
mailgen --count 1000

# Fast mode
mailgen -c 1000000 --fast

Command Line Options

USAGE:
    emailgen [OPTIONS]

OPTIONS:
    -c, --count <COUNT>            Number of emails to generate [default: 1000]
    -o, --output <OUTPUT>          Output file path (stdout if not specified)
    -n, --names <NAMES>            Path to names wordlist file
    -d, --domains <DOMAINS>        Path to domains file
        --min-length <MIN>         Minimum username length [default: 5]
        --max-length <MAX>         Maximum username length [default: 30]
        --capacity <CAP>           Bloom filter capacity [default: 1000000]
        --fpr <FPR>                Bloom filter false positive rate [default: 0.01]
        --fast                     Fast mode (100% wordlist/cached, no Markov)
        --wordlist-percent <PCT>   Wordlist name percentage (0-100, default: auto)
        --cache-percent <PCT>      Cached name percentage (0-100, default: auto)
        --markov-percent <PCT>     Markov generation percentage (0-100, default: 30)
        --stats                    Show statistics after generation
    -q, --quiet                    Quiet mode (no output except errors)
    -h, --help                     Print help
    -V, --version                  Print version

**Features:**
- **TUI Progress Bar**: Animated text-based progress bar with spinner, percentage, speed, and ETA
- **Parallel Generation**: Multi-threaded generation (always enabled)
- **Async I/O**: Asynchronous file writing (always enabled)

**Note:** The TUI progress bar animation works best in interactive terminals. When output is redirected, you'll see the final progress state.

Name Source Ratios

Control the balance between speed and variety:

# Specify all three (must add up to 100)
./target/release/mailgen --count 100000 --wordlist-percent 35 --cache-percent 35 --markov-percent 30

# Specify only one - others auto-calculated
./target/release/mailgen --count 100000 --markov-percent 20
# Auto-calculates: 40% wordlist, 40% cached, 20% Markov

./target/release/mailgen --count 100000 --wordlist-percent 80
# Auto-calculates: 80% wordlist, 15% cached, 5% Markov

./target/release/mailgen --count 100000 --cache-percent 70
# Auto-calculates: 25% wordlist, 70% cached, 5% Markov

# Specify two - third auto-calculated
./target/release/mailgen --count 100000 --wordlist-percent 50 --markov-percent 10
# Auto-calculates: 50% wordlist, 40% cached, 10% Markov

# Fast mode shortcut (50% wordlist, 50% cached, 0% Markov)
./target/release/mailgen --count 100000 --fast
Ratio (wordlist/cache/markov) Speed Variety Use Case
100/0/0 ~260K/sec Low Bulk test data
50/50/0 (--fast) ~260K/sec Medium Fast generation
35/35/30 (default) ~2.6K/sec High General use with variety
25/25/50 ~1.5K/sec Very High Maximum variety

Examples

# Generate 10K emails with stats
./target/release/mailgen -c 10000 --stats

# Generate with custom wordlists
./target/release/mailgen -c 100000 \
    -n names.txt \
    -d domains.txt \
    -o output.txt

# Generate with specific constraints
./target/release/mailgen -c 50000 \
    --min-length 6 \
    --max-length 20 \
    --capacity 100000 \
    --fpr 0.001

Architecture

Markov Chain Name Generation

The email generator uses character-level Markov chains to generate realistic names:

  1. Training: Names from wordlist are converted to character sequences
  2. Generation: New names are generated by walking the Markov chain
  3. Patterns: Multiple email patterns create variety (first.last, firstlast, etc.)

Bloom Filter Uniqueness

Bloom filters provide space-efficient uniqueness checking:

  • Space Efficient: ~1.14 MB for 1M elements at 1% false positive rate
  • Fast Operations: O(k) where k is number of hash functions
  • No False Negatives: If it says "not seen", it's definitely unique
  • Configurable FPR: Trade memory for accuracy

Wordlist Format

Names File

One name per line (first + last):

John Smith
Jane Doe
Bob Johnson

Domains File

One domain per line:

gmail.com
yahoo.com
example.com

License

MIT License - see LICENSE for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mailgen_rs-0.1.1-py3-none-win_amd64.whl (1.1 MB view details)

Uploaded Python 3Windows x86-64

mailgen_rs-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

mailgen_rs-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

mailgen_rs-0.1.1-py3-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

mailgen_rs-0.1.1-py3-none-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file mailgen_rs-0.1.1-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 7954b6bab21d68d324a860470f3d45536bb66088c1b964cfd8ee98cb16f8dbba
MD5 cb1f7b6fc15d44b1cbd7a3ffa5e4342b
BLAKE2b-256 e17a495e4d40f9ac11a6f4fb885900ee8aef1deda63d54d4bbdaefc2e143fd62

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1919084fe743b07099c2a4fa325c3d7871009686f5894fe7d7e37ee1e3a0f58c
MD5 f866316970c743a4ed8b0a82f12e0770
BLAKE2b-256 17e1779a145229697f347c5146860a85380bccb1edf93df821760ab5fd12be67

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 97348ec3f604285493e34468a4fe8eff6ca2d27d31be55cb434fdc0e5a498201
MD5 4391f69542cc66439c737ac3424a052a
BLAKE2b-256 c7a3ea22e931cced85cf4b9a5df1a0ea24baf645ac554c79b8622836d50ec4bf

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cbe0d348e198bf4a9b1ecf30f53705dded1ed950cd34c24fa408b463e7289085
MD5 a30c727d5ac3f6da5ee30e2d622f4c07
BLAKE2b-256 ce583a95671e73b7a4558b569764668466920748c91636d61b96fcdbe3cb5694

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 756171ab0e5a80b656f79d1adb41c542d656729ffd20f3e33d28213e2efe1b32
MD5 8224e08672e03c270b0046194899aa53
BLAKE2b-256 18ffe3aaa261116a8f669571f5c69e9f3a67b114448235fda3300dff9a20a42b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page