Skip to main content

A high-performance, professional-grade log generator for testing, simulation, and development

Project description

LogForge 🔨

A high-performance, professional-grade log generator for testing, simulation, and development purposes.

PyPI version Python Support License: MIT Tests Code Quality

🚀 Features

  • High Performance: Generate 250K+ logs per second
  • Multiple Formats: Support for JSON, Apache, Nginx, Syslog, CSV, and more
  • Realistic Data: Generate realistic IPs, user agents, timestamps, and messages
  • Anomaly Detection: Advanced anomaly injection for ML training with temporal patterns
  • Deterministic Generation: Seed-based reproducible log generation
  • Configurable: Highly customizable through config files or CLI options
  • Memory Efficient: Batch processing and streaming output
  • Parallel Processing: Multi-core support for maximum performance
  • Rich CLI: Beautiful progress bars and detailed statistics
  • Extensible: Easy to add custom formats and data generators

📦 Installation

pip install logforge

🛠️ Quick Start

Command Line Usage

Generate 1 million logs in JSON format:

logforge generate --count 1000000 --format json --output logs.json

Generate Apache access logs with compression:

logforge generate \
  --count 500000 \
  --format apache_common \
  --output access.log \
  --compression gzip

Run performance benchmark:

logforge benchmark --count 1000000 --format json

Python API Usage

from logforge import LogGenerator, LogConfig

# Basic usage
config = LogConfig(total_logs=10000, output={"format": "json"})
generator = LogGenerator(config)
generator.generate()

# Advanced configuration
config = LogConfig(
    total_logs=1000000,
    log_levels=["DEBUG", "INFO", "WARNING", "ERROR"],
    level_distribution={
        "levels": {
            "DEBUG": 0.2,
            "INFO": 0.6,
            "WARNING": 0.15,
            "ERROR": 0.05
        }
    },
    output={
        "format": "json",
        "file_path": "app.log",
        "compression": "gzip"
    },
    performance={
        "batch_size": 10000,
        "workers": 4
    },
    custom_fields={
        "app_name": "my-app",
        "version": "1.0.0"
    }
)

generator = LogGenerator(config)
generator.generate()

# Get performance stats
stats = generator.get_performance_stats()
print(f"Generated {stats['total_logs_generated']} logs in {stats['duration_seconds']:.2f} seconds")
print(f"Rate: {stats['logs_per_second']:,.0f} logs/second")

📊 Supported Log Formats

Format Description Example Output
standard Simple timestamp-level-message 2023-12-01 10:30:15 - INFO - User login successful
json Structured JSON logs {"timestamp": "2023-12-01T10:30:15", "level": "INFO", "message": "User login"}
apache_common Apache Common Log Format 192.168.1.1 - - [01/Dec/2023:10:30:15 +0000] "GET /index.html HTTP/1.1" 200 1024
apache_combined Apache Combined Log Format Apache Common + referer + user agent
nginx Nginx access log format Similar to Apache with Nginx-specific fields
syslog RFC3164 Syslog format <134>Dec 1 10:30:15 hostname app[1234]: Log message
csv Comma-separated values "2023-12-01T10:30:15","INFO","User login"
logfmt Key=value format timestamp=2023-12-01T10:30:15 level=INFO message="User login"
gelf Graylog Extended Log Format JSON format for Graylog
cef Common Event Format `CEF:0

⚙️ Configuration

Configuration File

Create a configuration file (config.json):

{
  "total_logs": 1000000,
  "log_levels": ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
  "level_distribution": {
    "levels": {
      "DEBUG": 0.25,
      "INFO": 0.60,
      "WARNING": 0.12,
      "ERROR": 0.025,
      "CRITICAL": 0.005
    }
  },
  "time": {
    "duration": "24h",
    "interval": 1.0,
    "jitter": 0.1
  },
  "output": {
    "format": "json",
    "file_path": "app.log",
    "compression": "gzip",
    "buffer_size": 65536
  },
  "performance": {
    "batch_size": 10000,
    "workers": 4,
    "use_numpy": true,
    "precompute_timestamps": true
  },
  "custom_fields": {
    "app_name": "my-application",
    "environment": "production",
    "version": "2.1.0"
  },
  "message_templates": {
    "INFO": [
      "User {user_id} logged in successfully",
      "Processing request for {endpoint}",
      "Database query completed in {query_time}ms"
    ],
    "ERROR": [
      "Failed to connect to database: {error}",
      "Authentication failed for user {user_id}",
      "Service {service_name} is unavailable"
    ]
  }
}

Use with CLI:

logforge generate --config config.json

Or initialize a config file:

logforge init-config --output my-config.json --format json --count 1000000

CLI Options

logforge generate --help

Key options:

  • --count: Number of logs to generate
  • --format: Log format (see supported formats above)
  • --output: Output file path (default: stdout)
  • --compression: Compression format (gzip, bz2, lzma)
  • --workers: Number of parallel workers
  • --batch-size: Batch size for processing
  • --benchmark: Run in benchmark mode
  • --no-progress: Disable progress bar

🏆 Performance

LogForge is designed for high performance:

  • 250K+ logs per second in optimal conditions
  • 100K-150K logs per second in typical usage
  • Memory-efficient streaming generation
  • Parallel processing with automatic CPU detection
  • Optimized data structures and algorithms
  • Batch writing to minimize I/O overhead

Benchmark Results

Performance varies significantly based on configuration. Here are realistic benchmarks on modern 8-core hardware:

Optimal Performance (Large Batch Size)

Configuration: 10M logs, batch_size=1,000,000, workers=5
Duration: 34.69 seconds
Logs per second: 288,296
Peak memory: 1,598.4 MB
Peak CPU: 109.0%

Typical Performance (Balanced Configuration)

Configuration: 10M logs, batch_size=100,000, workers=1
Duration: 85.04 seconds  
Logs per second: 117,592
Peak memory: 969.4 MB
Peak CPU: 106.5%

Performance Factors

  • Batch Size: Most significant impact on performance (larger = faster, but more memory)
  • Workers: Parallel processing can improve throughput with larger datasets
  • Format: JSON ~160 bytes/log, affects memory usage and I/O
  • Output: Writing to /dev/null is fastest for benchmarks

Realistic Expectations

  • 1M logs: 3-8 seconds
  • 10M logs: 35-85 seconds
  • 100M logs: 6-15 minutes
  • Memory usage: ~2KB per log in memory during batch processing

Run your own benchmark:

logforge benchmark --count 10000000 --format json --iterations 3

🔍 Anomaly Detection

LogForge includes advanced anomaly injection capabilities for ML training data generation:

Generate Logs with Anomalies

# Enable anomaly injection with 20% anomaly rate
logforge generate --count 10000 --anomalies --anomaly-rate 0.2 --seed 42

# Use external anomaly configuration
logforge generate --count 10000 --anomaly-config anomaly_config.json

Anomaly Types

  • Security: failed_auth, brute_force, suspicious_access, privilege_escalation
  • Performance: high_latency, memory_spike, cpu_spike, slow_query
  • System: service_unavailable, database_error, network_error
  • Behavioral: unusual_volume, geographic_anomaly, user_behavior

Temporal Patterns

  • BURST: Intense anomaly periods (simulating attacks)
  • GRADUAL_INCREASE: Slowly developing performance issues
  • PERIODIC: Regular anomaly cycles
  • SPIKE: Short-duration intense anomalies

Example Anomaly Configuration

{
  "enabled": true,
  "base_rate": 0.1,
  "seed": 42,
  "patterns": [
    {
      "pattern_type": "burst",
      "anomaly_types": ["brute_force", "failed_auth"],
      "base_rate": 0.05,
      "peak_rate": 0.8,
      "duration": "10m",
      "start_time": "5m"
    }
  ]
}

📚 Examples

Generate Web Server Logs

# Apache access logs
logforge generate \
  --count 1000000 \
  --format apache_combined \
  --output access.log \
  --duration 7d

# Nginx logs with realistic traffic patterns
logforge generate \
  --count 5000000 \
  --format nginx \
  --output nginx.log \
  --interval exponential \
  --compression gzip

Application Logs

# JSON application logs
logforge generate \
  --count 100000 \
  --format json \
  --custom-fields '{"app": "api", "version": "2.1.0"}' \
  --level-dist '{"INFO": 0.7, "WARNING": 0.2, "ERROR": 0.1}' \
  --output app.log

System Logs

# Syslog format
logforge generate \
  --count 50000 \
  --format syslog \
  --output system.log \
  --interval 0.5

CSV for Analysis

# CSV format for data analysis
logforge generate \
  --count 10000 \
  --format csv \
  --output logs.csv

🔧 Development

Setup Development Environment

git clone https://github.com/josesolisrosales/logforge.git
cd logforge
pip install -e .[dev]

Run Tests

pytest

Code Quality

black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/

Building

python -m build

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with Click for the CLI
  • Rich for beautiful terminal output
  • Faker for realistic data generation
  • Pydantic for configuration validation

LogForge - Forge logs like a master craftsman! 🔨⚡

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logforge-0.1.0.tar.gz (35.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

logforge-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file logforge-0.1.0.tar.gz.

File metadata

  • Download URL: logforge-0.1.0.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for logforge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8e369bec476cf7ca2f738c29cc264053e1929b0556834daa883d47a4f0cfe625
MD5 8a0b558c9a7504794cb1c11b63d46a3e
BLAKE2b-256 958be593fe5e71be169b0a1d1f855d7ef6a7fabe91203f080b4fe96f489d6b12

See more details on using hashes here.

Provenance

The following attestation bundles were made for logforge-0.1.0.tar.gz:

Publisher: publish-to-pypi.yml on josesolisrosales/logforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file logforge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: logforge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for logforge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22072bd688735ce9cff0293d365d0c3c539861eed2c5f26a9a80ee61a0032cf8
MD5 56bc3c5ca5188a56b4fee17926052b30
BLAKE2b-256 88d09ebd88ed412ce90dc5e679b79dbd44f40c663a711d0182a77dc4a225a32b

See more details on using hashes here.

Provenance

The following attestation bundles were made for logforge-0.1.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on josesolisrosales/logforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page