Skip to main content

High-performance email generator using Markov chains and Bloom filters

Project description

mailgen

High-performance email generator using Markov chains and Bloom filters.

Build Status License: MIT

Features

  • 🚀 High Performance - Generate 250K+ emails per second (Fast Mode)
  • 🎯 Realistic Names - Markov chain-based name generation
  • Uniqueness Guaranteed - Bloom filter for efficient duplicate detection
  • 📝 Custom Wordlists - Support for custom name and domain lists
  • 🔧 Configurable - Multiple email patterns and generation options
  • 💾 Memory Efficient - ~1.2 MB for 1 million unique emails

Installation

Binary Installation (Recommended)

Quickly install the latest binary for your system (Linux, macOS, or Windows):

# Linux/macOS (using install script)
curl -fsSL https://raw.githubusercontent.com/akin01/emailgen/main/install.sh | sudo bash

# Windows PowerShell (one-liner, no file download needed)
powershell -ExecutionPolicy Bypass -Command "iwr -useb https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 | iex"

# Alternative PowerShell syntax
Invoke-WebRequest -Uri https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 -UseBasicParsing | Invoke-Expression

Package Manager Installation

Install the published CLI globally with either npm or uv:

# npm
npm install -g @akin01/mailgen

# uv
uv tool install mailgen-rs

Then verify the installation:

mailgen --version

Build from Source

Quick Start

Generate Emails

# Generate 1000 emails to stdout
./target/release/mailgen --count 1000

# Generate 1 million emails to file (Fast Mode)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Use custom wordlists
./target/release/mailgen --count 10000 \
    --names data/example_names.txt \
    --domains data/example_domains.txt \
    --output emails.txt

As a Library

Add to your Cargo.toml:

[dependencies]
emailgen = { git = "https://github.com/akin01/emailgen" }
use mailgen::EmailGenerator;

fn main() {
    // Basic usage
    let mut generator = EmailGenerator::new();
    let email = generator.generate();
    println!("Generated: {}", email);

    // Generate many emails
    let emails = generator.generate_many(1000);

    // With custom wordlists
    let names = vec!["John Doe".to_string(), "Jane Smith".to_string()];
    let domains = vec!["example.com".to_string()];
    let mut generator = EmailGenerator::with_names_and_domains(names, domains);
    let emails = generator.generate_many(10000);
}

Performance

Generation Speed (Actual Benchmarks)

Mode 10K 100K 1M
Fast Mode (--fast) 0.04s 0.38s 7.5s
Default Mode 3.9s 39s ~6.5 min

💡 Tip: Use --fast mode for bulk generation (>10K emails) for best performance.

Memory Usage

  • ~1.2 MB for 1 million unique emails (Bloom filter)

Usage

# Fast mode for bulk generation (~250K emails/sec)
./target/release/mailgen --count 1000000 --output emails.txt --fast

# Default mode with 30% Markov for variety (~2.6K emails/sec)
./target/release/mailgen --count 100000 --output emails.txt

# Generate to stdout
./target/release/mailgen --count 1000 --fast

See PERFORMANCE.md for detailed benchmarks.

Usage

Direct Command Line

After installing via the script or a package manager, the mailgen command is available directly in your terminal:

# Basic usage
mailgen --count 1000

# Fast mode
mailgen -c 1000000 --fast

Command Line Options

USAGE:
    mailgen [OPTIONS]

OPTIONS:
    -c, --count <COUNT>            Number of emails to generate [default: 1000]
    -o, --output <OUTPUT>          Output file path (stdout if not specified)
    -n, --names <NAMES>            Path to names wordlist file
    -d, --domains <DOMAINS>        Path to domains file
        --min-length <MIN>         Minimum username length [default: 5]
        --max-length <MAX>         Maximum username length [default: 30]
        --capacity <CAP>           Bloom filter capacity [default: 1000000]
        --fpr <FPR>                Bloom filter false positive rate [default: 0.01]
        --fast                     Fast mode (100% wordlist/cached, no Markov)
        --wordlist-percent <PCT>   Wordlist name percentage (0-100, default: auto)
        --cache-percent <PCT>      Cached name percentage (0-100, default: auto)
        --markov-percent <PCT>     Markov generation percentage (0-100, default: 30)
        --stats                    Show statistics after generation
    -q, --quiet                    Quiet mode (no output except errors)
    -h, --help                     Print help
    -V, --version                  Print version

**Features:**
- **TUI Progress Bar**: Animated text-based progress bar with spinner, percentage, speed, and ETA
- **Parallel Generation**: Multi-threaded generation (always enabled)
- **Async I/O**: Asynchronous file writing (always enabled)

**Note:** The TUI progress bar animation works best in interactive terminals. When output is redirected, you'll see the final progress state.

Name Source Ratios

Control the balance between speed and variety:

# Specify all three (must add up to 100)
./target/release/mailgen --count 100000 --wordlist-percent 35 --cache-percent 35 --markov-percent 30

# Specify only one - others auto-calculated
./target/release/mailgen --count 100000 --markov-percent 20
# Auto-calculates: 40% wordlist, 40% cached, 20% Markov

./target/release/mailgen --count 100000 --wordlist-percent 80
# Auto-calculates: 80% wordlist, 15% cached, 5% Markov

./target/release/mailgen --count 100000 --cache-percent 70
# Auto-calculates: 25% wordlist, 70% cached, 5% Markov

# Specify two - third auto-calculated
./target/release/mailgen --count 100000 --wordlist-percent 50 --markov-percent 10
# Auto-calculates: 50% wordlist, 40% cached, 10% Markov

# Fast mode shortcut (50% wordlist, 50% cached, 0% Markov)
./target/release/mailgen --count 100000 --fast
Ratio (wordlist/cache/markov) Speed Variety Use Case
100/0/0 ~260K/sec Low Bulk test data
50/50/0 (--fast) ~260K/sec Medium Fast generation
35/35/30 (default) ~2.6K/sec High General use with variety
25/25/50 ~1.5K/sec Very High Maximum variety

Examples

# Generate 10K emails with stats
./target/release/mailgen -c 10000 --stats

# Generate with custom wordlists
./target/release/mailgen -c 100000 \
    -n names.txt \
    -d domains.txt \
    -o output.txt

# Generate with specific constraints
./target/release/mailgen -c 50000 \
    --min-length 6 \
    --max-length 20 \
    --capacity 100000 \
    --fpr 0.001

Architecture

Markov Chain Name Generation

The email generator uses character-level Markov chains to generate realistic names:

  1. Training: Names from wordlist are converted to character sequences
  2. Generation: New names are generated by walking the Markov chain
  3. Patterns: Multiple email patterns create variety (first.last, firstlast, etc.)

Bloom Filter Uniqueness

Bloom filters provide space-efficient uniqueness checking:

  • Space Efficient: ~1.14 MB for 1M elements at 1% false positive rate
  • Fast Operations: O(k) where k is number of hash functions
  • No False Negatives: If it says "not seen", it's definitely unique
  • Configurable FPR: Trade memory for accuracy

Wordlist Format

Names File

One name per line (first + last):

John Smith
Jane Doe
Bob Johnson

Domains File

One domain per line:

gmail.com
yahoo.com
example.com

License

MIT License - see LICENSE for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mailgen_rs-0.1.5-py3-none-win_amd64.whl (1.1 MB view details)

Uploaded Python 3Windows x86-64

mailgen_rs-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

mailgen_rs-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

mailgen_rs-0.1.5-py3-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

mailgen_rs-0.1.5-py3-none-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file mailgen_rs-0.1.5-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.5-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 62b682471317355ee011b739cba5364300bd11fda0873215c0ef7a2e99805ab7
MD5 b76b580840eac7615596ef4d8c39a327
BLAKE2b-256 f84734243f9ba0710a8e4f7b13d48544e6f057510c092d16178dc853c71f50ac

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 72167c64075f276bf0e27dcf603fb63e46226372c3bde0d5b44ccbac937c348b
MD5 3f67a294ff33ac43a088359c2b3108ae
BLAKE2b-256 abb3f0764a68215b26dbbfde301822d76d00c95a30d15fcbe60b8e5fc05b79f8

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1d508431557b42aee13c42ed12bf1796debc83b2a360f5c51c84a3b5a2da7b34
MD5 8cacbe7ae4da26585e253d9cfd3ce1c6
BLAKE2b-256 c1732c27cd775352f63dffe619a90c122abcc2cb5e879d8bf4dc93e4582af876

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.5-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.5-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2204298604bbaab7c6ee34532ed741e52399e225c0875ccec3b574d4f73dbd45
MD5 e87032c45178344d43097c9dbabbc511
BLAKE2b-256 283873610aa19b3911de366ee964190569eb3ef98a375ead1b3645a5cb3f8017

See more details on using hashes here.

File details

Details for the file mailgen_rs-0.1.5-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for mailgen_rs-0.1.5-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b530c3419d390fbf8b3c89adb9f0319273d3ed823c6be136e41ed16e617c396c
MD5 f669a2ee497d04070d19bb8328cc9985
BLAKE2b-256 c4312264c955523e015b6dc71cb9cf8e797e79b1a67f241331dad63668be9ee9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page