Skip to main content

Rosetta is a Python package that can be used to fake security logs and alerts for testing different detection and response use cases.

Project description

snyk codeql slack-community

Rosetta

Rosetta is a Python library for generating realistic security telemetry and alerts at scale. It can:

  • Generate observables/indicators (IPs, URLs, hashes, CVEs, MITRE ATT&CK techniques)
  • Emit synthetic logs in multiple formats (SYSLOG, CEF, LEEF, JSON, Windows Event XML)
  • Produce incident bundles composed of multiple event types
  • Convert one log format to another (e.g., CEF to JSON/LEEF)
  • Send synthetic logs to TCP/UDP/HTTP/HTTPS endpoints
  • Validate fields against a schema and generate missing values heuristically
  • Simulate database queries including SQL injection patterns
  • Generate Kubernetes and cloud-native telemetry

Installation

  • Install from PyPI:
pip install rosetta-ce
  • Install from source:
git clone https://github.com/ayman-m/rosetta.git
cd rosetta
python setup.py install

Quick start

from rosetta import Events, Observables, ObservableType, ObservableKnown

# Generate observables
bad_ips = Observables.generator(count=3, observable_type=ObservableType.IP, known=ObservableKnown.BAD)

# Inject custom observables and extra fields
observables = Observables(
    src_host=["web-01"],
    user=["alex"],
    url=["https://example.org"],
    custom_field=["custom_value"],
)

# Create events in different formats
syslog_events = Events.syslog(count=2, observables=observables)
cef_events = Events.cef(count=2, observables=observables)
leef_events = Events.leef(count=2, observables=observables)
json_events = Events.json(count=2, observables=observables)
win_events = Events.winevent(count=2, observables=observables)

Observables

Observable types

Type Description Known Values
IP IPv4 addresses BAD (malicious), GOOD (benign)
URL Web URLs BAD (malicious), GOOD (benign)
SHA256 File hashes BAD (malicious), GOOD (benign)
CVE CVE identifiers N/A
TERMS MITRE ATT&CK techniques (280+ IDs) N/A

Fetch or generate indicators

from rosetta import Observables, ObservableType, ObservableKnown

bad_urls = Observables.generator(count=2, observable_type=ObservableType.URL, known=ObservableKnown.BAD)
good_hashes = Observables.generator(count=2, observable_type=ObservableType.SHA256, known=ObservableKnown.GOOD)
cves = Observables.generator(count=2, observable_type=ObservableType.CVE)
terms = Observables.generator(count=2, observable_type=ObservableType.TERMS)

Provide your own observables

Observables accepts known fields and arbitrary extra fields via **kwargs.

from rosetta import Observables

observables = Observables(
    local_ip=["192.168.10.10"],
    remote_ip=["1.1.1.1"],
    src_host=["abc"],
    dst_host=["xyz"],
    user=["ayman"],
    file_name=["test.zip"],
    custom_field=["custom_value"],
)

Built-in observable fields

Rosetta supports 270+ observable fields covering traditional and modern infrastructure.

Category Fields
Network (IPv4/IPv6) local_ip, remote_ip, local_ip_v6, remote_ip_v6, local_port, remote_port, protocol
Network Extended source_ip, destination_ip, source_port, destination_port, client_ip, server_ip, client_port, server_port, public_ip, private_ip, nat_source_ip, nat_destination_ip, client_mac, server_hostname, client_hostname, destination_hostname, source_hostname
Hosts & Domains src_host, dst_host, src_domain, dst_domain, url, hostname, host, domain
HTTP/API http_method, http_uri, http_status_code, http_user_agent, http_host, http_referer, api_endpoint, api_key, api_name, request_id, response_time_ms, content_type
DNS/DHCP dns_query, dns_response, dns_server, query_time_ms, lease_duration
Kubernetes/Containers container_id, container_name, container_image, pod_name, pod_uid, namespace, cluster, node_name, service_account, labels, annotations
Cloud Infrastructure cloud_provider, region, availability_zone, account_id, account_name, tenant_id, instance_id, instance_name, instance_type, ami_id, image_id, image_name, vpc_id, subnet_id, security_groups, iam_role, bucket_name, bucket_arn, resource_id, resource_name, resource_type, resource_arn, resource_attributes
SSL/TLS ssl_cipher, ssl_version, tls_version, certificate_cn, certificate_issuer, ja3_hash, ja3s_hash
Threat Detection mitre_tactic, mitre_technique, threat_score, threat_level, threat_name, threat_type, signature_id, signature_name, cve_id, cvss_score, ioc_type, ioc_value
Users & Email user, sender_email, recipient_email, email_subject, email_body, sender, recipient, subject, message_id, attachment_name, attachment_hash, spf_result, dkim_result, dmarc_result
Authentication authentication_method, authentication_result, mfa_method, mfa_result, logon_type, session_id, username, account_name
Files file_name, file_hash, file_path, file_size, file_type, file_hash_sha256, file_hash_md5, file_hash_sha1, file_owner
Processes win_process, win_child_process, unix_process, unix_child_process, win_cmd, unix_cmd, parent_process_name, command_line, executable_path, working_directory, process_name, process_guid, ppid
Firewall/IDS firewall_name, rule_name, rule_action, zone_source, zone_destination, tcp_flags, packets, bytes_sent, bytes_received
Virtual Machines vm_id, vm_name, hypervisor_type, cpu_usage, memory_usage
Database query_type, database_name, query, query_text, execution_time_ms, transaction_id, affected_rows, schema_name
Vulnerability/Compliance vulnerability_id, vulnerability_name, scan_result, scan_type, compliance_status
Incident Response incident_id, incident_severity, incident_status, playbook_id, alert_id
Security severity, action, event_id, error_code, technique, cve, terms
Alerts & Incidents alert_types, alert_name, incident_types, analysts, action_status
Common Fields status, result, message, description, timestamp, risk_score, priority, category, tags, malware_name, malware_type, direction, geo_location, country
Other app, os, sensor, entry_type, inbound_bytes, outbound_bytes

Industry-standard field naming: Rosetta supports both traditional naming (local_ip, remote_ip) and industry-standard naming (source_ip, destination_ip, client_ip, server_ip) for better compatibility with modern SIEM platforms.

Events

Rosetta supports generating events in multiple industry-standard log formats:

Format Description Use Case
SYSLOG RFC 5424 syslog format Unix/Linux system logs, network devices
CEF Common Event Format SIEM integration (ArcSight, Splunk)
LEEF Log Event Extended Format IBM QRadar integration
JSON Structured JSON format Modern SIEM, Elasticsearch, cloud platforms
Windows Event XML Windows Event Log format Windows security monitoring, Sysmon
Incidents Bundled multi-format events Incident response testing, SOC training

SYSLOG

from rosetta import Events

Events.syslog(count=1)
Events.syslog(count=1, observables=observables)

CEF

Events.cef(count=1, observables=observables)
Events.cef(count=1, observables=observables, required_fields="local_ip,local_port,remote_ip,remote_port,protocol,rule_id,action")

LEEF

Events.leef(count=1, observables=observables)

Windows Event Log (XML)

Events.winevent(count=1, observables=observables)

JSON

Events.json(count=1, observables=observables)

Incidents (bundled events)

Events.incidents(count=1, fields="id,type,duration,analyst,description,events", observables=observables)

Supported incident types

Rosetta includes 11 predefined incident categories:

  • Malware
  • Phishing
  • Access Violation
  • Lateral Movement
  • Port Scan
  • SQL Injection
  • Brute Force
  • Control Avoidance
  • Rogue Device
  • Denial of Service
  • Account Compromised

Required fields and presets

Rosetta can require specific fields per event. You can pass required_fields directly, or rely on presets.

  • Preset file: rosetta/schema/required_presets.json
  • Keys: syslog, cef, leef, json, winevent
# Explicit override
Events.syslog(count=1, required_fields="timestamp,hostname,username")

# Use presets (default behavior)
Events.syslog(count=1)

If the preset file is missing or empty, Rosetta falls back to built-in defaults.

Field control and determinism

  • If you supply values in Observables, those values are used verbatim for matching fields (deterministic control).
  • If you do not supply Observables, values are generated by built-in generators and heuristics (random but type-aware).
  • You can still control structure without observables using required_fields, plus datetime_iso and vendor/product/version parameters on CEF/LEEF/JSON.
from rosetta import Events, Observables

# Deterministic values via Observables
obs = Observables(
    source_ip=["203.0.113.10"],
    destination_ip=["10.0.5.20"],
    user=["alice"],
    http_method=["POST"],
)
Events.json(count=2, observables=obs)

# Control structure without observables
Events.cef(count=1, required_fields="local_ip,local_port,remote_ip,remote_port,protocol,rule_id,action")

Schema validation

Rosetta checks required fields and observables against a supported-fields list and emits warnings for unknown fields.

  • Schema file: rosetta/schema/supported_fields.json
  • Required field presets: rosetta/schema/required_presets.json
  • Behavior: non-blocking warnings only
from rosetta import Events, Observables

Events.syslog(count=1, observables=Observables(), required_fields="unknown_field")
# Warning: Field 'unknown_field' is not in schema/supported_fields.json

Supported schema fields (1000+ fields)

Representative fields by category (all are supported; full list in rosetta/schema/supported_fields.json).

Identity & Authentication

username, user, user_id, user_sid, user_dn, user_ou, user_type, user_role, user_group, actor_username, actor_sid, actor_id, actor_uid, actor_arn, actor_ip, target_username, target_user_sid, target_user_id, target_uid, admin_username, admin_ip, analyst_username, creator_username, creator_ip, display_name, full_name, email, department, title, manager

Authentication & Sessions

authentication_method, authentication_result, authentication_package, authentication_status, authorization_status, session_id, session_type, session_start, session_end, session_duration, session_timeout, token_id, token_expiry, token_elevation_type, mfa_method, mfa_result, logon_type, logon_process, logon_guid, logon_id, logon_time, logoff_time, login_type, login_time, last_login, last_logon, last_password_change

Network & Connectivity

client_ip, client_port, client_hostname, client_mac, server_ip, server_port, server_hostname, source_ip, source_port, source_mac, source_hostname, destination_ip, destination_port, destination_mac, destination_hostname, local_ip, local_port, remote_ip, remote_port, remote_host, assigned_ip, public_ip, private_ip, nat_source_ip, nat_destination_ip, scanner_ip, target_ip, target_port, target_hostname

DNS & DHCP

dns_server, dns_servers, dns_query, dns_response, dns_flags, dns_name, dnssec_validated, query_name, query_class, query_time_ms, query_count, response_data, response_ip, response_count, response_ttl, authoritative, recursion_desired, recursion_available, lease_duration, lease_start, lease_expiry, lease_state, scope_name, scope_id

HTTP & Web

http_method, http_uri, http_host, http_status_code, http_protocol, http_referer, http_user_agent, http_query_string, request_id, request_size, request_body_sample, request_headers, response_code, response_size, response_time_ms, response_body_sample, response_headers, content_type, content_length, user_agent, referer, cookie, cookies, url, url_category, url_categories

API Gateway

gateway_name, api_key, api_name, api_endpoint, api_operation, api_version, api_parameters, api_call, oauth_client_id, oauth_scope, rate_limit_policy, rate_limit_remaining, quota_policy, quota_remaining, backend_server, backend_response_time_ms, backend_status_code, cache_status, cache_hit

Files & Storage

file_name, file_path, file_type, file_size, file_hash, file_hash_md5, file_hash_sha1, file_hash_sha256, file_hash_imphash, file_owner, file_group, file_permissions, file_attributes, file_version, original_filename, creation_time, modification_time, deletion_time, access_time, old_hash, new_hash, old_size, new_size, old_permissions, new_permissions

Processes & Execution

process_id, process_name, process_guid, parent_process_name, parent_process_guid, parent_command_line, parent_image, pid, ppid, executable_path, command_line, command, arguments, args, working_directory, cwd, image, image_path, image_loaded, start_time, stop_time, exit_code, cpu_time, thread_count, handle_count

Windows Events

event_id, event_type, event_record_id, event_category, logon_id, linked_logon_id, virtual_account, elevated_token, mandatory_label, integrity_level, terminal_session_id, current_directory, source_pid, source_process_name, source_image, source_user, target_pid, target_process_name, target_image, granted_access, call_trace

Registry

registry_key, registry_value_name, registry_value_type, registry_value_data, old_value_type, old_value_data, new_value_type, new_value_data, target_object, details, new_name

Services & Scheduled Tasks

service_name, service_type, service_state, service_path, service_file_name, service_start_type, service_unit, service_account, task_name, task_content, task_id, task_status, task_result, trigger_type, trigger_value, run_level, enabled, schedule, last_run_time, next_run_time

Modules & Drivers

module_name, module_path, module_base_address, module_size, module_version, module_parameters, module_hash, driver_name, signature_status, signature_level, signed, signed_by, signer, load_reason, load_result, load_address, load_time, is_kernel_mode

PowerShell & Scripts

script_block_text, script_path, script_content, script_hash, script_content_hash, script_block_id, script_engine, host_application, engine_version, runspace_id, pipeline_id, interpreter, obfuscation_score

Containers & Kubernetes

container_id, container_name, container_image, namespace, pod_name, pod_uid, node_name, cluster, labels, annotations, resource_limits, security_context, service_account, restart_count, exit_code_previous, environment_variables, cgroup, namespace_pid, capabilities

Cloud & Infrastructure

cloud_provider, region, instance_id, instance_name, instance_type, ami_id, vpc_id, subnet_id, security_groups, iam_role, resource_type, resource_id, resource_name, resource_arn, bucket_name, bucket_arn, volume_id, volume_name, volume_type, volume_size, snapshot_id, snapshot_name, tags

Virtual Machines

hypervisor_type, vm_id, vm_name, vm_uuid, cpu_usage, memory_usage, cpu_count, memory_mb, disk_size_gb, network_adapters, template_name, resource_pool, datastore, target_vm, target_host, boot_time_ms, uptime_seconds, previous_state

Database

database_name, database_role, query_type, query_text, query, command_type, command_text, object_name, schema_name, execution_status, execution_time_ms, affected_rows, transaction_id, privilege, error_code, error_message

Email & Messaging

sender, recipient, sender_email, recipient_email, sender_domain, recipient_domain, subject, message_id, message_size, message_count, attachment_name, attachment_type, attachment_size, attachment_hash, attachment_count, attachment_names, attachment_types, attachment_hashes, spam_score, phishing_score, spf_result, dkim_result, dmarc_result

Firewall & Network Security

firewall_name, rule_id, rule_name, rule_type, rule_number, rule_action, acl_name, acl_type, action, action_taken, zone_source, zone_destination, interface_in, interface_out, input_interface, output_interface, source_network, destination_network, port_range, tcp_flags, packets, bytes, bytes_sent, bytes_received

IDS/IPS & Threat Detection

signature_id, signature_name, signature_category, attack_type, attack_vector, attack_category, attack_severity, threat_type, threat_name, threat_category, threat_score, threat_level, threat_severity, threat_indicator, threat_detected, detection_name, detection_type, mitre_tactic, mitre_technique, cve_id, cvss_score, cvss_vector

Endpoint Detection

agent_id, agent_version, scan_id, scan_type, scan_result, scan_status, scan_start, scan_end, scan_duration, finding_id, vulnerability_id, vulnerability_name, vulnerability_description, remediation, quarantine_id, quarantine_status, quarantine_path, quarantined, blocked

SIEM & Incident Response

incident_id, incident_name, incident_type, incident_severity, incident_status, alert_id, alert_type, alert_name, playbook_id, playbook_name, analyst_notes, confidence, risk_score, risk_level, severity, priority

SSL/TLS

ssl_protocol, ssl_version, ssl_cipher, ssl_subject, ssl_issuer, ssl_client_cert_cn, ssl_ja3_hash, ssl_ja3s_hash, tls_version, tls_cipher, cipher_suite, certificate_cn, certificate_serial, certificate_issuer, certificate_subject, certificate_validity_start, certificate_validity_end, certificate_chain_valid, certificate_revocation_status, ja3_hash, ja3s_hash

VPN & Remote Access

vpn_group, tunnel_type, tunnel_id, encryption_algorithm, idle_timeout, session_timeout, bytes_quota, client_version

Wireless

ssid, ap_name, ap_mac, bssid, eap_type, vlan_assigned, radio_type, channel, rssi, snr, roam_count, association_time, data_rate, power_save_mode

Network Access Control

identity_group, policy_matched, nas_ip, nas_port, calling_station_id, called_station_id, radius_attributes, switch_ip, switch_port, vlan_id, vlan_name, posture_status, endpoint_policy

Data Loss Prevention

data_classification, sensitive_data_flag, sensitive_data_types, sensitive_data_detected, sensitive_data_added, sensitive_data_removed, pattern_matched, bytes_inspected, dlp_verdict, dlp_violation, dlp_scan_result, masked_fields, channel_type

Vulnerability Management

scanner_ip, target_os, target_os_version, service_detected, service_version, banner, vulnerability_checks, vulnerabilities_found, vulnerabilities_critical, vulnerabilities_high, vulnerabilities_medium, vulnerabilities_low, vulnerabilities_info, compliance_score, exploit_available, patch_available, first_detected, last_detected

Mobile Device Management

device_type, device_id, device_name, enrollment_status, enrollment_method, enrollment_time, serial_number, imei, jailbreak_status, passcode_compliant, installed_apps_count, managed_apps_count, certificates_installed, profiles_installed

Privileged Access Management

vault_name, checkout_id, checkout_reason, checkout_time, checkin_time, session_duration_limit, recording_enabled, recording_id, target_account, target_account_type, target_system, credential_type, credential_name

Application Logs

application, application_name, application_version, environment, log_level, logger_name, message, exception_type, exception_message, stack_trace, thread_name, thread_id, span_id, trace_id, custom_fields

Audit & Compliance

audit_id, operation, operation_type, modification_type, change_type, change_description, change_reason, old_value, new_value, justification, approval_id, approval_status, approver, workflow_id, compliance_status, policy_name, policy_violation

Metrics & Performance

metric_name, metric_value, threshold, cpu_usage, memory_usage, disk_usage, throughput_bps, connection_count, response_time_ms, execution_time_ms, duration, jitter_ms, offset_ms

Heuristic value generation

When a field has no explicit value, Rosetta infers a reasonable value based on name patterns. This makes large schemas usable without hardcoding every field.

Supported field patterns

Category Patterns
Network *_ip, *_ipv6, *_port, *_mac, *_domain, *_hostname, *_url
Identity *_email, *_user, *_sid, *_arn
Identifiers *_id, *_uuid, *_guid
Hashing *_hash, *_md5, *_sha1, *_sha256
Status *_status, *_result, *_outcome, *_verdict, *_action
Metrics *_size, *_bytes, *_count, *_duration, *_ms, *_score, *_percent
Time *_time, *_timestamp, *_date
HTTP/API http_*, request_*, response_*, api_*
DNS/DHCP dns_*, dhcp_*
Authentication auth_*, mfa_*, token_*, session_*, role_*, permission_*
Kubernetes namespace, pod_*, container_*, node_*, cluster, labels, annotations, service_account
Threats vulnerability_*, cve, cvss_*, threat_*, mitre_*, ioc_*
Email/SMTP sender_*, recipient_*, smtp_*, dkim_*, spf_*, dmarc_*
Boolean is_*, *_enabled, *_flag

Sender

Send synthetic events to TCP/UDP/HTTP/HTTPS endpoints using multi-threaded workers.

Supported data types

  • SYSLOG
  • CEF
  • LEEF
  • WINEVENT
  • JSON
  • INCIDENT

Destination formats

  • UDP: udp:127.0.0.1:514
  • TCP: tcp:127.0.0.1:514
  • HTTP: http://127.0.0.1:8000/endpoint
  • HTTPS: https://127.0.0.1:8000/endpoint

Example

from rosetta import Sender, WorkerTypeEnum

# UDP syslog
udp_worker = Sender(
    data_type=WorkerTypeEnum.SYSLOG,
    destination="udp:127.0.0.1:514",
    observables=observables,
    count=5,
    interval=2
)
udp_worker.start()

# HTTP JSON
http_worker = Sender(
    data_type=WorkerTypeEnum.JSON,
    destination="http://127.0.0.1:8000/logs",
    observables=observables,
    count=5,
    interval=2
)
http_worker.start()

Converter

from rosetta import Converter, ConverterToEnum, ConverterFromEnum

cef_log = "CEF:0|Security|IDS|1.0|Alert|10|src=192.168.0.1 dst=192.168.0.2 act=blocked"
converted = Converter.convert(from_type=ConverterFromEnum.CEF, to_type=ConverterToEnum.JSON, data=cef_log)

Testing

python3 -m unittest discover -s tests

Database telemetry

Rosetta can generate realistic database activity logs including normal operations and attack patterns.

Supported query types

SELECT, INSERT, UPDATE, DELETE, ALTER, CREATE, DROP, TRUNCATE, GRANT, REVOKE, MERGE, CALL

Attack patterns included

  • SQL injection queries
  • Unauthorized data manipulation
  • Privilege escalation attempts

OWASP Top 10 attack simulation

Rosetta includes built-in OWASP Top 10 attack technique indicators:

  • Injection (SQL, Command)
  • Broken Authentication and Session Management
  • Cross-Site Scripting (XSS)
  • Broken Access Control
  • Security Misconfiguration
  • Insecure Cryptographic Storage
  • Insufficient Transport Layer Protection
  • Unvalidated Redirects and Forwards
  • Using Components with Known Vulnerabilities
  • Insufficient Logging and Monitoring

Network protocols

Supported protocols for telemetry generation: TCP, UDP, HTTP, SSL, SQL, SSH, FTP, RTP, RDP

Windows telemetry

Rosetta generates realistic Windows endpoint data including:

  • 18 common Windows processes (explorer.exe, svchost.exe, lsass.exe, etc.)
  • PowerShell commands for attack simulation
  • Windows Event Log XML templates (Sysmon, Security events)

Examples

See the examples/ directory for complete usage examples:

  • observables.py - Generate indicators
  • events_formats.py - Create events in different formats
  • incidents.py - Build incident bundles
  • sender_tcp_udp_http.py - Send events to endpoints
  • converter.py - Convert between formats
  • k8s_fields.py - Kubernetes field generation
  • presets_schema.py - Schema validation

Notes

  • Some observable generators fetch from public sources. When offline, Rosetta falls back to synthetic values.
  • Preset and schema files are generated from the CSV mapping in the project root and can be updated as your schema evolves.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rosetta_ce-1.8.4.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rosetta_ce-1.8.4-py3-none-any.whl (45.9 kB view details)

Uploaded Python 3

File details

Details for the file rosetta_ce-1.8.4.tar.gz.

File metadata

  • Download URL: rosetta_ce-1.8.4.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rosetta_ce-1.8.4.tar.gz
Algorithm Hash digest
SHA256 48e0ab4d822621c722433cddeff0892aa246507bb51a0180f0516ea68ab172d4
MD5 bac874dc78c666fa7f32cba2fe394bca
BLAKE2b-256 08defd3eb121dbdb144f9c299c3720d7fb83b4b3b414344c87aa21fd0397c99f

See more details on using hashes here.

File details

Details for the file rosetta_ce-1.8.4-py3-none-any.whl.

File metadata

  • Download URL: rosetta_ce-1.8.4-py3-none-any.whl
  • Upload date:
  • Size: 45.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rosetta_ce-1.8.4-py3-none-any.whl
Algorithm Hash digest
SHA256 08a5b387f1908c1e83e6422b5c0983946242e6a5b7b0d88e2d1ae4fd0ca5b267
MD5 0426370a1cc99be5b63bfb8596066f64
BLAKE2b-256 3aa4cedc9c9138152a86807ec50ccec64468cf6ede398494115911de1c867bf3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page