A high-performance, professional-grade log generator for testing, simulation, and development
Project description
LogForge 🔨
A high-performance, professional-grade log generator for testing, simulation, and development purposes.
🚀 Features
- High Performance: Generate 250K+ logs per second
- Multiple Formats: Support for JSON, Apache, Nginx, Syslog, CSV, and more
- Realistic Data: Generate realistic IPs, user agents, timestamps, and messages
- Anomaly Detection: Advanced anomaly injection for ML training with temporal patterns
- Deterministic Generation: Seed-based reproducible log generation
- Configurable: Highly customizable through config files or CLI options
- Memory Efficient: Batch processing and streaming output
- Parallel Processing: Multi-core support for maximum performance
- Rich CLI: Beautiful progress bars and detailed statistics
- Extensible: Easy to add custom formats and data generators
📦 Installation
pip install logforge
🛠️ Quick Start
Command Line Usage
Generate 1 million logs in JSON format:
logforge generate --count 1000000 --format json --output logs.json
Generate Apache access logs with compression:
logforge generate \
--count 500000 \
--format apache_common \
--output access.log \
--compression gzip
Run performance benchmark:
logforge benchmark --count 1000000 --format json
Python API Usage
from logforge import LogGenerator, LogConfig
# Basic usage
config = LogConfig(total_logs=10000, output={"format": "json"})
generator = LogGenerator(config)
generator.generate()
# Advanced configuration
config = LogConfig(
total_logs=1000000,
log_levels=["DEBUG", "INFO", "WARNING", "ERROR"],
level_distribution={
"levels": {
"DEBUG": 0.2,
"INFO": 0.6,
"WARNING": 0.15,
"ERROR": 0.05
}
},
output={
"format": "json",
"file_path": "app.log",
"compression": "gzip"
},
performance={
"batch_size": 10000,
"workers": 4
},
custom_fields={
"app_name": "my-app",
"version": "1.0.0"
}
)
generator = LogGenerator(config)
generator.generate()
# Get performance stats
stats = generator.get_performance_stats()
print(f"Generated {stats['total_logs_generated']} logs in {stats['duration_seconds']:.2f} seconds")
print(f"Rate: {stats['logs_per_second']:,.0f} logs/second")
📊 Supported Log Formats
| Format | Description | Example Output |
|---|---|---|
standard |
Simple timestamp-level-message | 2023-12-01 10:30:15 - INFO - User login successful |
json |
Structured JSON logs | {"timestamp": "2023-12-01T10:30:15", "level": "INFO", "message": "User login"} |
apache_common |
Apache Common Log Format | 192.168.1.1 - - [01/Dec/2023:10:30:15 +0000] "GET /index.html HTTP/1.1" 200 1024 |
apache_combined |
Apache Combined Log Format | Apache Common + referer + user agent |
nginx |
Nginx access log format | Similar to Apache with Nginx-specific fields |
syslog |
RFC3164 Syslog format | <134>Dec 1 10:30:15 hostname app[1234]: Log message |
csv |
Comma-separated values | "2023-12-01T10:30:15","INFO","User login" |
logfmt |
Key=value format | timestamp=2023-12-01T10:30:15 level=INFO message="User login" |
gelf |
Graylog Extended Log Format | JSON format for Graylog |
cef |
Common Event Format | `CEF:0 |
⚙️ Configuration
Configuration File
Create a configuration file (config.json):
{
"total_logs": 1000000,
"log_levels": ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
"level_distribution": {
"levels": {
"DEBUG": 0.25,
"INFO": 0.60,
"WARNING": 0.12,
"ERROR": 0.025,
"CRITICAL": 0.005
}
},
"time": {
"duration": "24h",
"interval": 1.0,
"jitter": 0.1
},
"output": {
"format": "json",
"file_path": "app.log",
"compression": "gzip",
"buffer_size": 65536
},
"performance": {
"batch_size": 10000,
"workers": 4,
"use_numpy": true,
"precompute_timestamps": true
},
"custom_fields": {
"app_name": "my-application",
"environment": "production",
"version": "2.1.0"
},
"message_templates": {
"INFO": [
"User {user_id} logged in successfully",
"Processing request for {endpoint}",
"Database query completed in {query_time}ms"
],
"ERROR": [
"Failed to connect to database: {error}",
"Authentication failed for user {user_id}",
"Service {service_name} is unavailable"
]
}
}
Use with CLI:
logforge generate --config config.json
Or initialize a config file:
logforge init-config --output my-config.json --format json --count 1000000
CLI Options
logforge generate --help
Key options:
--count: Number of logs to generate--format: Log format (see supported formats above)--output: Output file path (default: stdout)--compression: Compression format (gzip, bz2, lzma)--workers: Number of parallel workers--batch-size: Batch size for processing--benchmark: Run in benchmark mode--no-progress: Disable progress bar
🏆 Performance
LogForge is designed for high performance:
- 250K+ logs per second in optimal conditions
- 100K-150K logs per second in typical usage
- Memory-efficient streaming generation
- Parallel processing with automatic CPU detection
- Optimized data structures and algorithms
- Batch writing to minimize I/O overhead
Benchmark Results
Performance varies significantly based on configuration. Here are realistic benchmarks on modern 8-core hardware:
Optimal Performance (Large Batch Size)
Configuration: 10M logs, batch_size=1,000,000, workers=5
Duration: 34.69 seconds
Logs per second: 288,296
Peak memory: 1,598.4 MB
Peak CPU: 109.0%
Typical Performance (Balanced Configuration)
Configuration: 10M logs, batch_size=100,000, workers=1
Duration: 85.04 seconds
Logs per second: 117,592
Peak memory: 969.4 MB
Peak CPU: 106.6%
Performance Factors
- Batch Size: Most significant impact on performance (larger = faster, but more memory)
- Workers: Parallel processing can improve throughput with larger datasets
- Format: JSON ~160 bytes/log, affects memory usage and I/O
- Output: Writing to
/dev/nullis fastest for benchmarks
Realistic Expectations
- 1M logs: 3-8 seconds
- 10M logs: 35-85 seconds
- 100M logs: 6-15 minutes
- Memory usage: ~2KB per log in memory during batch processing
Run your own benchmark:
logforge benchmark --count 10000000 --format json --iterations 3
🔍 Anomaly Detection
LogForge includes advanced anomaly injection capabilities for ML training data generation:
Generate Logs with Anomalies
# Enable anomaly injection with 20% anomaly rate
logforge generate --count 10000 --anomalies --anomaly-rate 0.2 --seed 42
# Use external anomaly configuration
logforge generate --count 10000 --anomaly-config anomaly_config.json
Anomaly Types
- Security: failed_auth, brute_force, suspicious_access, privilege_escalation
- Performance: high_latency, memory_spike, cpu_spike, slow_query
- System: service_unavailable, database_error, network_error
- Behavioral: unusual_volume, geographic_anomaly, user_behavior
Temporal Patterns
- BURST: Intense anomaly periods (simulating attacks)
- GRADUAL_INCREASE: Slowly developing performance issues
- PERIODIC: Regular anomaly cycles
- SPIKE: Short-duration intense anomalies
Example Anomaly Configuration
{
"enabled": true,
"base_rate": 0.1,
"seed": 42,
"patterns": [
{
"pattern_type": "burst",
"anomaly_types": ["brute_force", "failed_auth"],
"base_rate": 0.05,
"peak_rate": 0.8,
"duration": "10m",
"start_time": "5m"
}
]
}
📚 Examples
Generate Web Server Logs
# Apache access logs
logforge generate \
--count 1000000 \
--format apache_combined \
--output access.log \
--duration 7d
# Nginx logs with realistic traffic patterns
logforge generate \
--count 5000000 \
--format nginx \
--output nginx.log \
--interval exponential \
--compression gzip
Application Logs
# JSON application logs
logforge generate \
--count 100000 \
--format json \
--custom-fields '{"app": "api", "version": "2.1.0"}' \
--level-dist '{"INFO": 0.7, "WARNING": 0.2, "ERROR": 0.1}' \
--output app.log
System Logs
# Syslog format
logforge generate \
--count 50000 \
--format syslog \
--output system.log \
--interval 0.5
CSV for Analysis
# CSV format for data analysis
logforge generate \
--count 10000 \
--format csv \
--output logs.csv
🔧 Development
Setup Development Environment
git clone https://github.com/josesolisrosales/logforge.git
cd logforge
pip install -e .[dev]
Run Tests
pytest
Code Quality
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/
Building
python -m build
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built with Click for the CLI
- Rich for beautiful terminal output
- Faker for realistic data generation
- Pydantic for configuration validation
LogForge - Forge logs like a master craftsman! 🔨⚡
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logforge-1.0.0.tar.gz.
File metadata
- Download URL: logforge-1.0.0.tar.gz
- Upload date:
- Size: 35.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78115102f1b61b8c7755f24adadaeaa13ccafa38ce36d9d5136dfd343d733c5b
|
|
| MD5 |
6d600c61eb76631cf10db6a7381a94b4
|
|
| BLAKE2b-256 |
83e06e0641e9a96cf3fe0d5530a77d24940e262f4a9b268b53e1b043af5991b1
|
Provenance
The following attestation bundles were made for logforge-1.0.0.tar.gz:
Publisher:
publish-to-pypi.yml on josesolisrosales/logforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logforge-1.0.0.tar.gz -
Subject digest:
78115102f1b61b8c7755f24adadaeaa13ccafa38ce36d9d5136dfd343d733c5b - Sigstore transparency entry: 263823334
- Sigstore integration time:
-
Permalink:
josesolisrosales/logforge@acc057c189ee4b8a17671364f5c8b6153eb2a7ae -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/josesolisrosales
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@acc057c189ee4b8a17671364f5c8b6153eb2a7ae -
Trigger Event:
release
-
Statement type:
File details
Details for the file logforge-1.0.0-py3-none-any.whl.
File metadata
- Download URL: logforge-1.0.0-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5575a934cc11cb0f625d14c23b0a13bb5c62e958f8709211c7191b240d8d6ca0
|
|
| MD5 |
5703ed4aa46ccd461b1be12e4ad60916
|
|
| BLAKE2b-256 |
4425c5841224725c64b4c862609ebc96c6559c2c53aaa8ae3a78a0b2ea4741bd
|
Provenance
The following attestation bundles were made for logforge-1.0.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on josesolisrosales/logforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logforge-1.0.0-py3-none-any.whl -
Subject digest:
5575a934cc11cb0f625d14c23b0a13bb5c62e958f8709211c7191b240d8d6ca0 - Sigstore transparency entry: 263823339
- Sigstore integration time:
-
Permalink:
josesolisrosales/logforge@acc057c189ee4b8a17671364f5c8b6153eb2a7ae -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/josesolisrosales
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@acc057c189ee4b8a17671364f5c8b6153eb2a7ae -
Trigger Event:
release
-
Statement type: