High-performance email generator using Markov chains and Bloom filters
Project description
mailgen
High-performance email generator using Markov chains and Bloom filters.
Features
- 🚀 High Performance - Generate 250K+ emails per second (Fast Mode)
- 🎯 Realistic Names - Markov chain-based name generation
- ✅ Uniqueness Guaranteed - Bloom filter for efficient duplicate detection
- 📝 Custom Wordlists - Support for custom name and domain lists
- 🔧 Configurable - Multiple email patterns and generation options
- 💾 Memory Efficient - ~1.2 MB for 1 million unique emails
Installation
Binary Installation (Recommended)
Quickly install the latest binary for your system (Linux, macOS, or Windows):
# Linux/macOS (using install script)
curl -fsSL https://raw.githubusercontent.com/akin01/emailgen/main/install.sh | sudo bash
# Windows PowerShell (one-liner, no file download needed)
powershell -ExecutionPolicy Bypass -Command "iwr -useb https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 | iex"
# Alternative PowerShell syntax
Invoke-WebRequest -Uri https://raw.githubusercontent.com/akin01/emailgen/main/install.ps1 -UseBasicParsing | Invoke-Expression
Package Manager Installation
Install the published CLI globally with either npm or uv:
# npm
npm install -g @akin01/mailgen
# uv
uv tool install mailgen-rs
Then verify the installation:
mailgen --version
Build from Source
Quick Start
Generate Emails
# Generate 1000 emails to stdout
./target/release/mailgen --count 1000
# Generate 1 million emails to file (Fast Mode)
./target/release/mailgen --count 1000000 --output emails.txt --fast
# Use custom wordlists
./target/release/mailgen --count 10000 \
--names data/example_names.txt \
--domains data/example_domains.txt \
--output emails.txt
As a Library
Add to your Cargo.toml:
[dependencies]
emailgen = { git = "https://github.com/akin01/emailgen" }
use mailgen::EmailGenerator;
fn main() {
// Basic usage
let mut generator = EmailGenerator::new();
let email = generator.generate();
println!("Generated: {}", email);
// Generate many emails
let emails = generator.generate_many(1000);
// With custom wordlists
let names = vec!["John Doe".to_string(), "Jane Smith".to_string()];
let domains = vec!["example.com".to_string()];
let mut generator = EmailGenerator::with_names_and_domains(names, domains);
let emails = generator.generate_many(10000);
}
Performance
Generation Speed (Actual Benchmarks)
| Mode | 10K | 100K | 1M |
|---|---|---|---|
Fast Mode (--fast) |
0.04s | 0.38s | 7.5s |
| Default Mode | 3.9s | 39s | ~6.5 min |
💡 Tip: Use --fast mode for bulk generation (>10K emails) for best performance.
Memory Usage
- ~1.2 MB for 1 million unique emails (Bloom filter)
Usage
# Fast mode for bulk generation (~250K emails/sec)
./target/release/mailgen --count 1000000 --output emails.txt --fast
# Default mode with 30% Markov for variety (~2.6K emails/sec)
./target/release/mailgen --count 100000 --output emails.txt
# Generate to stdout
./target/release/mailgen --count 1000 --fast
See PERFORMANCE.md for detailed benchmarks.
Usage
Direct Command Line
After installing via the script or a package manager, the mailgen command is available directly in your terminal:
# Basic usage
mailgen --count 1000
# Fast mode
mailgen -c 1000000 --fast
Command Line Options
USAGE:
mailgen [OPTIONS]
OPTIONS:
-c, --count <COUNT> Number of emails to generate [default: 1000]
-o, --output <OUTPUT> Output file path (stdout if not specified)
-n, --names <NAMES> Path to names wordlist file
-d, --domains <DOMAINS> Path to domains file
--min-length <MIN> Minimum username length [default: 5]
--max-length <MAX> Maximum username length [default: 30]
--capacity <CAP> Bloom filter capacity [default: 1000000]
--fpr <FPR> Bloom filter false positive rate [default: 0.01]
--fast Fast mode (100% wordlist/cached, no Markov)
--wordlist-percent <PCT> Wordlist name percentage (0-100, default: auto)
--cache-percent <PCT> Cached name percentage (0-100, default: auto)
--markov-percent <PCT> Markov generation percentage (0-100, default: 30)
--stats Show statistics after generation
-q, --quiet Quiet mode (no output except errors)
-h, --help Print help
-V, --version Print version
**Features:**
- **TUI Progress Bar**: Animated text-based progress bar with spinner, percentage, speed, and ETA
- **Parallel Generation**: Multi-threaded generation (always enabled)
- **Async I/O**: Asynchronous file writing (always enabled)
**Note:** The TUI progress bar animation works best in interactive terminals. When output is redirected, you'll see the final progress state.
Name Source Ratios
Control the balance between speed and variety:
# Specify all three (must add up to 100)
./target/release/mailgen --count 100000 --wordlist-percent 35 --cache-percent 35 --markov-percent 30
# Specify only one - others auto-calculated
./target/release/mailgen --count 100000 --markov-percent 20
# Auto-calculates: 40% wordlist, 40% cached, 20% Markov
./target/release/mailgen --count 100000 --wordlist-percent 80
# Auto-calculates: 80% wordlist, 15% cached, 5% Markov
./target/release/mailgen --count 100000 --cache-percent 70
# Auto-calculates: 25% wordlist, 70% cached, 5% Markov
# Specify two - third auto-calculated
./target/release/mailgen --count 100000 --wordlist-percent 50 --markov-percent 10
# Auto-calculates: 50% wordlist, 40% cached, 10% Markov
# Fast mode shortcut (50% wordlist, 50% cached, 0% Markov)
./target/release/mailgen --count 100000 --fast
| Ratio (wordlist/cache/markov) | Speed | Variety | Use Case |
|---|---|---|---|
| 100/0/0 | ~260K/sec | Low | Bulk test data |
| 50/50/0 (--fast) | ~260K/sec | Medium | Fast generation |
| 35/35/30 (default) | ~2.6K/sec | High | General use with variety |
| 25/25/50 | ~1.5K/sec | Very High | Maximum variety |
Examples
# Generate 10K emails with stats
./target/release/mailgen -c 10000 --stats
# Generate with custom wordlists
./target/release/mailgen -c 100000 \
-n names.txt \
-d domains.txt \
-o output.txt
# Generate with specific constraints
./target/release/mailgen -c 50000 \
--min-length 6 \
--max-length 20 \
--capacity 100000 \
--fpr 0.001
Architecture
Markov Chain Name Generation
The email generator uses character-level Markov chains to generate realistic names:
- Training: Names from wordlist are converted to character sequences
- Generation: New names are generated by walking the Markov chain
- Patterns: Multiple email patterns create variety (first.last, firstlast, etc.)
Bloom Filter Uniqueness
Bloom filters provide space-efficient uniqueness checking:
- Space Efficient: ~1.14 MB for 1M elements at 1% false positive rate
- Fast Operations: O(k) where k is number of hash functions
- No False Negatives: If it says "not seen", it's definitely unique
- Configurable FPR: Trade memory for accuracy
Wordlist Format
Names File
One name per line (first + last):
John Smith
Jane Doe
Bob Johnson
Domains File
One domain per line:
gmail.com
yahoo.com
example.com
License
MIT License - see LICENSE for details.
Acknowledgments
- markovify-rs - Markov chain implementation
- bloomfilter - Bloom filter implementation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mailgen_rs-0.1.5-py3-none-win_amd64.whl.
File metadata
- Download URL: mailgen_rs-0.1.5-py3-none-win_amd64.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62b682471317355ee011b739cba5364300bd11fda0873215c0ef7a2e99805ab7
|
|
| MD5 |
b76b580840eac7615596ef4d8c39a327
|
|
| BLAKE2b-256 |
f84734243f9ba0710a8e4f7b13d48544e6f057510c092d16178dc853c71f50ac
|
File details
Details for the file mailgen_rs-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: mailgen_rs-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72167c64075f276bf0e27dcf603fb63e46226372c3bde0d5b44ccbac937c348b
|
|
| MD5 |
3f67a294ff33ac43a088359c2b3108ae
|
|
| BLAKE2b-256 |
abb3f0764a68215b26dbbfde301822d76d00c95a30d15fcbe60b8e5fc05b79f8
|
File details
Details for the file mailgen_rs-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: mailgen_rs-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d508431557b42aee13c42ed12bf1796debc83b2a360f5c51c84a3b5a2da7b34
|
|
| MD5 |
8cacbe7ae4da26585e253d9cfd3ce1c6
|
|
| BLAKE2b-256 |
c1732c27cd775352f63dffe619a90c122abcc2cb5e879d8bf4dc93e4582af876
|
File details
Details for the file mailgen_rs-0.1.5-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: mailgen_rs-0.1.5-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2204298604bbaab7c6ee34532ed741e52399e225c0875ccec3b574d4f73dbd45
|
|
| MD5 |
e87032c45178344d43097c9dbabbc511
|
|
| BLAKE2b-256 |
283873610aa19b3911de366ee964190569eb3ef98a375ead1b3645a5cb3f8017
|
File details
Details for the file mailgen_rs-0.1.5-py3-none-macosx_10_12_x86_64.whl.
File metadata
- Download URL: mailgen_rs-0.1.5-py3-none-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b530c3419d390fbf8b3c89adb9f0319273d3ed823c6be136e41ed16e617c396c
|
|
| MD5 |
f669a2ee497d04070d19bb8328cc9985
|
|
| BLAKE2b-256 |
c4312264c955523e015b6dc71cb9cf8e797e79b1a67f241331dad63668be9ee9
|