Skip to main content

High-performance email generator using Markov chains and Bloom filters

Project description

mailgen

High-performance email generator using Markov chains and Bloom filters.

Build Status License: MIT

Features

  • 🚀 High Performance - Generate 250K+ emails per second (Fast Mode)
  • 🎯 Realistic Names - Markov chain-based name generation
  • Uniqueness Guaranteed - Bloom filter for efficient duplicate detection
  • 📝 Custom Wordlists - Support for custom name and domain lists
  • 🔧 Configurable - Multiple email patterns and generation options
  • 💾 Memory Efficient - ~1.2 MB for 1 million unique emails

Installation

Binary Installation (Recommended)

Quickly install the latest binary for your system (Linux, macOS, or Windows):

# Linux/macOS (using install script)
curl -fsSL https://raw.githubusercontent.com/akin01/emailgen/main/install.sh | sudo bash

# Windows PowerShell (one-liner, no file download needed)
powershell -ExecutionPolicy Bypass -Command "iwr -useb https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 | iex"

# Alternative PowerShell syntax
Invoke-WebRequest -Uri https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 -UseBasicParsing | Invoke-Expression

Build from Source

Quick Start

Generate Emails

# Generate 1000 emails to stdout
./target/release/mailgen --count 1000

# Generate 1 million emails to file (Fast Mode)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Use custom wordlists
./target/release/mailgen --count 10000 \
    --names data/example_names.txt \
    --domains data/example_domains.txt \
    --output emails.txt

As a Library

Add to your Cargo.toml:

[dependencies]
emailgen = { git = "https://github.com/akin01/emailgen" }
use mailgen::EmailGenerator;

fn main() {
    // Basic usage
    let mut generator = EmailGenerator::new();
    let email = generator.generate();
    println!("Generated: {}", email);

    // Generate many emails
    let emails = generator.generate_many(1000);

    // With custom wordlists
    let names = vec!["John Doe".to_string(), "Jane Smith".to_string()];
    let domains = vec!["example.com".to_string()];
    let mut generator = EmailGenerator::with_names_and_domains(names, domains);
    let emails = generator.generate_many(10000);
}

Performance

Generation Speed (Actual Benchmarks)

Mode 10K 100K 1M
Fast Mode (--fast) 0.04s 0.38s 7.5s
Default Mode 3.9s 39s ~6.5 min

💡 Tip: Use --fast mode for bulk generation (>10K emails) for best performance.

Memory Usage

  • ~1.2 MB for 1 million unique emails (Bloom filter)

Usage

# Fast mode for bulk generation (~250K emails/sec)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Default mode with 30% Markov for variety (~2.6K emails/sec)
./target/release/mailgen --count 100000 --output emails.txt

# Generate to stdout
./target/release/mailgen --count 1000 --fast

See PERFORMANCE.md for detailed benchmarks.

Usage

Direct Command Line

After installing via the script, uv, or npm, the mailgen command is available directly in your terminal:

# Basic usage
mailgen --count 1000

# Fast mode
mailgen -c 1000000 --fast

Command Line Options

USAGE:
    emailgen [OPTIONS]

OPTIONS:
    -c, --count <COUNT>            Number of emails to generate [default: 1000]
    -o, --output <OUTPUT>          Output file path (stdout if not specified)
    -n, --names <NAMES>            Path to names wordlist file
    -d, --domains <DOMAINS>        Path to domains file
        --min-length <MIN>         Minimum username length [default: 5]
        --max-length <MAX>         Maximum username length [default: 30]
        --capacity <CAP>           Bloom filter capacity [default: 1000000]
        --fpr <FPR>                Bloom filter false positive rate [default: 0.01]
        --fast                     Fast mode (100% wordlist/cached, no Markov)
        --wordlist-percent <PCT>   Wordlist name percentage (0-100, default: auto)
        --cache-percent <PCT>      Cached name percentage (0-100, default: auto)
        --markov-percent <PCT>     Markov generation percentage (0-100, default: 30)
        --stats                    Show statistics after generation
    -q, --quiet                    Quiet mode (no output except errors)
    -h, --help                     Print help
    -V, --version                  Print version

**Features:**
- **TUI Progress Bar**: Animated text-based progress bar with spinner, percentage, speed, and ETA
- **Parallel Generation**: Multi-threaded generation (always enabled)
- **Async I/O**: Asynchronous file writing (always enabled)

**Note:** The TUI progress bar animation works best in interactive terminals. When output is redirected, you'll see the final progress state.

Name Source Ratios

Control the balance between speed and variety:

# Specify all three (must add up to 100)
./target/release/mailgen --count 100000 --wordlist-percent 35 --cache-percent 35 --markov-percent 30

# Specify only one - others auto-calculated
./target/release/mailgen --count 100000 --markov-percent 20
# Auto-calculates: 40% wordlist, 40% cached, 20% Markov

./target/release/mailgen --count 100000 --wordlist-percent 80
# Auto-calculates: 80% wordlist, 15% cached, 5% Markov

./target/release/mailgen --count 100000 --cache-percent 70
# Auto-calculates: 25% wordlist, 70% cached, 5% Markov

# Specify two - third auto-calculated
./target/release/mailgen --count 100000 --wordlist-percent 50 --markov-percent 10
# Auto-calculates: 50% wordlist, 40% cached, 10% Markov

# Fast mode shortcut (50% wordlist, 50% cached, 0% Markov)
./target/release/mailgen --count 100000 --fast
Ratio (wordlist/cache/markov) Speed Variety Use Case
100/0/0 ~260K/sec Low Bulk test data
50/50/0 (--fast) ~260K/sec Medium Fast generation
35/35/30 (default) ~2.6K/sec High General use with variety
25/25/50 ~1.5K/sec Very High Maximum variety

Examples

# Generate 10K emails with stats
./target/release/mailgen -c 10000 --stats

# Generate with custom wordlists
./target/release/mailgen -c 100000 \
    -n names.txt \
    -d domains.txt \
    -o output.txt

# Generate with specific constraints
./target/release/mailgen -c 50000 \
    --min-length 6 \
    --max-length 20 \
    --capacity 100000 \
    --fpr 0.001

Architecture

Markov Chain Name Generation

The email generator uses character-level Markov chains to generate realistic names:

  1. Training: Names from wordlist are converted to character sequences
  2. Generation: New names are generated by walking the Markov chain
  3. Patterns: Multiple email patterns create variety (first.last, firstlast, etc.)

Bloom Filter Uniqueness

Bloom filters provide space-efficient uniqueness checking:

  • Space Efficient: ~1.14 MB for 1M elements at 1% false positive rate
  • Fast Operations: O(k) where k is number of hash functions
  • No False Negatives: If it says "not seen", it's definitely unique
  • Configurable FPR: Trade memory for accuracy

Wordlist Format

Names File

One name per line (first + last):

John Smith
Jane Doe
Bob Johnson

Domains File

One domain per line:

gmail.com
yahoo.com
example.com

License

MIT License - see LICENSE for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mailgen_rs-0.1.3-py3-none-win_amd64.whl (1.1 MB view details)

Uploaded Python 3Windows x86-64

mailgen_rs-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

mailgen_rs-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

mailgen_rs-0.1.3-py3-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

mailgen_rs-0.1.3-py3-none-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file mailgen_rs-0.1.3-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.3-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 9688afef6833bc7ebe8bcfe50cd71ed34abd48bc4c04f21af07bc6df3583aee9
MD5 3862f39961c90edba0acb9857bb97fe0
BLAKE2b-256 c6f84dae7437b41efb9b3ca3460d87a0b9a429ed9d82bdc0c16a2b2813a2ea5f

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec8f96ffe8495b762775606d0c0887e22ed8268583df631de876fe7ccde854bd
MD5 63838f10728d04f95cde35d705eed082
BLAKE2b-256 bde04dd573384756eead8e0626f27fa50a80c9d9fe5a1d211ad871bfa52685ca

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ad166f77645fcd32d2ef77aaac8c8ed18ac47f436c50e9e41dc030f7fe6b32fa
MD5 1276d2a5a4e45e10a58356d80019d80d
BLAKE2b-256 be6ff59fe401f5a0ebdac7b1e3ece048083e1557c62db480085e070e175e5b9d

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.3-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.3-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51dd6249a249fc9bdcabe3f6d70d882eed4f09799a6260260d3f7851762add9f
MD5 89fd20a0e70160d4de9fcf2f175fe064
BLAKE2b-256 f6c94124d697960e19a6bee3e4f0e283638b4329d72323efcffd443618deb3ce

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.3-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.3-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1e73a9d0fc6116f076f35331dfe49bb5802e9af2701fd6dd89063ab8080d443f
MD5 6077a9ce09b7d02aff56293e06e32697
BLAKE2b-256 ed8c1dd4d97a6ed5a1fce24bc9156cf89ec1bdec06fc83d8cdb4b1db998d1579

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page