Skip to main content

High-performance email generator using Markov chains and Bloom filters

Project description

mailgen

High-performance email generator using Markov chains and Bloom filters.

Build Status License: MIT

Features

  • 🚀 High Performance - Generate 250K+ emails per second (Fast Mode)
  • 🎯 Realistic Names - Markov chain-based name generation
  • Uniqueness Guaranteed - Bloom filter for efficient duplicate detection
  • 📝 Custom Wordlists - Support for custom name and domain lists
  • 🔧 Configurable - Multiple email patterns and generation options
  • 💾 Memory Efficient - ~1.2 MB for 1 million unique emails

Installation

Binary Installation (Recommended)

Quickly install the latest binary for your system (Linux, macOS, or Windows):

# Linux/macOS (using install script)
curl -fsSL https://raw.githubusercontent.com/akin01/emailgen/main/install.sh | sudo bash

# Windows PowerShell (one-liner, no file download needed)
powershell -ExecutionPolicy Bypass -Command "iwr -useb https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 | iex"

# Alternative PowerShell syntax
Invoke-WebRequest -Uri https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 -UseBasicParsing | Invoke-Expression

Build from Source

Quick Start

Generate Emails

# Generate 1000 emails to stdout
./target/release/mailgen --count 1000

# Generate 1 million emails to file (Fast Mode)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Use custom wordlists
./target/release/mailgen --count 10000 \
    --names data/example_names.txt \
    --domains data/example_domains.txt \
    --output emails.txt

As a Library

Add to your Cargo.toml:

[dependencies]
emailgen = { git = "https://github.com/akin01/emailgen" }
use mailgen::EmailGenerator;

fn main() {
    // Basic usage
    let mut generator = EmailGenerator::new();
    let email = generator.generate();
    println!("Generated: {}", email);

    // Generate many emails
    let emails = generator.generate_many(1000);

    // With custom wordlists
    let names = vec!["John Doe".to_string(), "Jane Smith".to_string()];
    let domains = vec!["example.com".to_string()];
    let mut generator = EmailGenerator::with_names_and_domains(names, domains);
    let emails = generator.generate_many(10000);
}

Performance

Generation Speed (Actual Benchmarks)

Mode 10K 100K 1M
Fast Mode (--fast) 0.04s 0.38s 7.5s
Default Mode 3.9s 39s ~6.5 min

💡 Tip: Use --fast mode for bulk generation (>10K emails) for best performance.

Memory Usage

  • ~1.2 MB for 1 million unique emails (Bloom filter)

Usage

# Fast mode for bulk generation (~250K emails/sec)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Default mode with 30% Markov for variety (~2.6K emails/sec)
./target/release/mailgen --count 100000 --output emails.txt

# Generate to stdout
./target/release/mailgen --count 1000 --fast

See PERFORMANCE.md for detailed benchmarks.

Usage

Direct Command Line

After installing via the script, uv, or npm, the mailgen command is available directly in your terminal:

# Basic usage
mailgen --count 1000

# Fast mode
mailgen -c 1000000 --fast

Command Line Options

USAGE:
    emailgen [OPTIONS]

OPTIONS:
    -c, --count <COUNT>            Number of emails to generate [default: 1000]
    -o, --output <OUTPUT>          Output file path (stdout if not specified)
    -n, --names <NAMES>            Path to names wordlist file
    -d, --domains <DOMAINS>        Path to domains file
        --min-length <MIN>         Minimum username length [default: 5]
        --max-length <MAX>         Maximum username length [default: 30]
        --capacity <CAP>           Bloom filter capacity [default: 1000000]
        --fpr <FPR>                Bloom filter false positive rate [default: 0.01]
        --fast                     Fast mode (100% wordlist/cached, no Markov)
        --wordlist-percent <PCT>   Wordlist name percentage (0-100, default: auto)
        --cache-percent <PCT>      Cached name percentage (0-100, default: auto)
        --markov-percent <PCT>     Markov generation percentage (0-100, default: 30)
        --stats                    Show statistics after generation
    -q, --quiet                    Quiet mode (no output except errors)
    -h, --help                     Print help
    -V, --version                  Print version

**Features:**
- **TUI Progress Bar**: Animated text-based progress bar with spinner, percentage, speed, and ETA
- **Parallel Generation**: Multi-threaded generation (always enabled)
- **Async I/O**: Asynchronous file writing (always enabled)

**Note:** The TUI progress bar animation works best in interactive terminals. When output is redirected, you'll see the final progress state.

Name Source Ratios

Control the balance between speed and variety:

# Specify all three (must add up to 100)
./target/release/mailgen --count 100000 --wordlist-percent 35 --cache-percent 35 --markov-percent 30

# Specify only one - others auto-calculated
./target/release/mailgen --count 100000 --markov-percent 20
# Auto-calculates: 40% wordlist, 40% cached, 20% Markov

./target/release/mailgen --count 100000 --wordlist-percent 80
# Auto-calculates: 80% wordlist, 15% cached, 5% Markov

./target/release/mailgen --count 100000 --cache-percent 70
# Auto-calculates: 25% wordlist, 70% cached, 5% Markov

# Specify two - third auto-calculated
./target/release/mailgen --count 100000 --wordlist-percent 50 --markov-percent 10
# Auto-calculates: 50% wordlist, 40% cached, 10% Markov

# Fast mode shortcut (50% wordlist, 50% cached, 0% Markov)
./target/release/mailgen --count 100000 --fast
Ratio (wordlist/cache/markov) Speed Variety Use Case
100/0/0 ~260K/sec Low Bulk test data
50/50/0 (--fast) ~260K/sec Medium Fast generation
35/35/30 (default) ~2.6K/sec High General use with variety
25/25/50 ~1.5K/sec Very High Maximum variety

Examples

# Generate 10K emails with stats
./target/release/mailgen -c 10000 --stats

# Generate with custom wordlists
./target/release/mailgen -c 100000 \
    -n names.txt \
    -d domains.txt \
    -o output.txt

# Generate with specific constraints
./target/release/mailgen -c 50000 \
    --min-length 6 \
    --max-length 20 \
    --capacity 100000 \
    --fpr 0.001

Architecture

Markov Chain Name Generation

The email generator uses character-level Markov chains to generate realistic names:

  1. Training: Names from wordlist are converted to character sequences
  2. Generation: New names are generated by walking the Markov chain
  3. Patterns: Multiple email patterns create variety (first.last, firstlast, etc.)

Bloom Filter Uniqueness

Bloom filters provide space-efficient uniqueness checking:

  • Space Efficient: ~1.14 MB for 1M elements at 1% false positive rate
  • Fast Operations: O(k) where k is number of hash functions
  • No False Negatives: If it says "not seen", it's definitely unique
  • Configurable FPR: Trade memory for accuracy

Wordlist Format

Names File

One name per line (first + last):

John Smith
Jane Doe
Bob Johnson

Domains File

One domain per line:

gmail.com
yahoo.com
example.com

License

MIT License - see LICENSE for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mailgen_rs-0.1.4-py3-none-win_amd64.whl (1.1 MB view details)

Uploaded Python 3Windows x86-64

mailgen_rs-0.1.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

mailgen_rs-0.1.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

mailgen_rs-0.1.4-py3-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

mailgen_rs-0.1.4-py3-none-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file mailgen_rs-0.1.4-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.4-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 4bfb7c04e639decb55d0c827d5d78c6ef45e693f98485958a09c5eb8102a2e4f
MD5 bacf74f46807e7a1ab0a92ec62fbb86f
BLAKE2b-256 ecdb82a639da83c867649cca2ce56dbc08e9a310696f6b2f8a6f800ca4977c02

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bb9019fc138f8f72b6865885dc6faa970a7bac4e36ce671a921d5e9e7c3e04f6
MD5 9791b350e06485651b9e311318d714ee
BLAKE2b-256 b41779eab9bdfe978641e9a9a9b1563c9528932a18c5769d0a6eefebaabd0c99

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 462b665f7368f5f5383eae85012b297f4101ba108569979dbe1d6e72aa5c6b3d
MD5 f5c50c813c3f791e5e2cab09d76399a2
BLAKE2b-256 b9681ccbab6e3ab5d6547cd7907c80a4dabc9d5f75ecbd5f0e4a6cd54cf3521a

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.4-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.4-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 af6f5dc098a1c35980dbcc90bf702022301286a8959ab8cc78390cba3b0df823
MD5 9a8688e5293ec8555c4f7e944d6408ab
BLAKE2b-256 1e70678483dec9643ecbd34d59e4a6974b124cea9605113b0ed3947caa1d31a6

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.4-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.4-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fb48a37f817027e2cd06ee8e1db654383544be1bfb2122cd5865a7f4912a5b3f
MD5 d1b2627d2aa52108b0ab8855c7b8c9f4
BLAKE2b-256 982aa516e1bc2a1a12e9e0d422df543721e0d1b18d4e622b319960ada8cc4e19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page