Production-ready DNS threat detection using machine learning

These details have not been verified by PyPI

Project links

Project description

DNS Threat Detector

Python License Version

A production-ready machine learning system for detecting malicious DNS domains including DGA domains, typosquatting attempts, malware C&C domains, and phishing sites.

Features

99.68% F1-Score on comprehensive test dataset
100% Typosquatting Detection with zero false positives
Sub-millisecond Latency (0.439ms average inference time)
Multi-tier Safelist with O(1) lookup for instant benign classification
99 Protected Brands including Google, Microsoft, PayPal, Amazon, etc.
Hybrid Architecture combining LightGBM, LSTM, and meta-learning
Easy Integration with Python API and CLI tool

Performance Metrics

Metric	Value
F1-Score	99.68%
Accuracy	99.38%
Precision	97.15%
Recall	99.95%
Typosquatting Detection	100%
False Positive Rate	28.5%
False Negative Rate	0.05%
Avg Latency	0.439 ms
Throughput	~2,275 domains/sec

Installation

pip install dns-threat-detector

Quick Start

Python API

from dns_threat_detector import DNS_ThreatDetector

# Initialize detector with safelist enabled
detector = DNS_ThreatDetector(use_safelist=True)
detector.load_models()

# Predict a single domain
result = detector.predict('gooogle.com')
print(result)
# Output:
# {
#   'prediction': 'MALICIOUS',
#   'confidence': 0.9000,
#   'reason': 'Typosquatting (dist=1 to google)',
#   'method': 'typosquatting_rule',
#   'latency_ms': 0.234
# }

# Batch predictions
domains = ['google.com', 'gooogle.com', 'example.com']
results = detector.predict_batch(domains)

Command-Line Interface

# Predict a single domain
dns-detect predict gooogle.com

# Get JSON output
dns-detect predict gooogle.com --json

# Batch process domains from file
dns-detect batch domains.txt --output results.json

# Show model information
dns-detect info

# Run self-tests
dns-detect test

Architecture

The DNS Threat Detector uses a sophisticated hybrid ensemble approach:

1. LightGBM Classifier

Gradient-boosted decision trees
11 features (4 FQDN + 7 typosquatting-specific)
200 trees with max depth 7
Handles structured feature patterns

2. Bidirectional LSTM

Character-level neural network
41-character vocabulary
159K parameters
Embedding(41→32) → Bi-LSTM(32→64×2) → FC(128→64→2)
Captures sequential patterns

3. Meta-Learner

Logistic regression stacking ensemble
Combines LightGBM and LSTM predictions
Learned weights: LSTM=7.04, LightGBM=2.53
Final classification decision

4. Typosquatting Detector

Rule-based + ML hybrid approach
Edit distance (Levenshtein) to 99 top brands
Distance 1-3 → Malicious (typosquatting)
Exact brand match → Benign (whitelist)

5. Multi-Tier Safelist (Optional)

Tier 1: 30K critical domains (government, finance)
Tier 2: 29K high-trust domains (tech, education)
Tier 3: 85K general trusted domains
O(1) in-memory lookup
322× speedup for safelisted domains

Prediction Flow

Domain Input
    ↓
┌─────────────────────┐
│  Safelist Check     │ → BENIGN (if listed)
└─────────────────────┘
    ↓
┌─────────────────────┐
│  Brand Whitelist    │ → BENIGN (exact match)
└─────────────────────┘
    ↓
┌─────────────────────┐
│  Typosquatting Rule │ → MALICIOUS (edit dist 1-3)
└─────────────────────┘
    ↓
┌─────────────────────┐
│  ML Ensemble        │ → MALICIOUS/BENIGN
│  (LightGBM + LSTM)  │
└─────────────────────┘

Features Extracted

FQDN Features (4)

domain_length - Length of domain name excluding TLD
subdomain_count - Number of subdomains
numeric_chars - Count of numeric characters
entropy - Shannon entropy of character distribution

Typosquatting Features (7)

min_edit_distance - Minimum Levenshtein distance to top brands
edit_distance_ratio - Normalized edit distance by brand length
length_diff_to_closest - Length difference to closest brand
has_extra_char - Binary: domain has 1 extra character
has_missing_char - Binary: domain missing 1 character
has_swapped_char - Binary: adjacent characters swapped
digit_substitution - Binary: contains digit substitution

API Reference

DNS_ThreatDetector

class DNS_ThreatDetector(
    models_dir: Optional[str] = None,
    use_safelist: bool = False,
    safelist_dir: Optional[str] = None,
    safelist_tiers: List[int] = [1, 2, 3]
)

Methods

load_models() Load all model components (LightGBM, LSTM, meta-learner, safelist)

predict(domain: str) -> Dict Predict if a domain is malicious or benign

Returns:

{
    'prediction': 'MALICIOUS' | 'BENIGN',
    'confidence': float,  # 0.0 to 1.0
    'reason': str,  # Human-readable explanation
    'method': str,  # 'safelist' | 'brand_whitelist' | 'typosquatting_rule' | 'ensemble'
    'latency_ms': float  # Inference time in milliseconds
}

predict_batch(domains: List[str]) -> List[Dict] Predict multiple domains

get_model_info() -> Dict Get comprehensive model information and statistics

save_metadata(output_path: str) Save model metadata to JSON file

Advanced Usage

Custom Model Paths

detector = DNS_ThreatDetector(
    models_dir='/path/to/models',
    use_safelist=True,
    safelist_dir='/path/to/safelists',
    safelist_tiers=[1, 2, 3]
)
detector.load_models()

Disable Safelist

# Faster initialization, no safelist loading
detector = DNS_ThreatDetector(use_safelist=False)
detector.load_models()

Batch Processing with Progress

from tqdm import tqdm

domains = ['example1.com', 'example2.com', ...]
results = []

for domain in tqdm(domains):
    result = detector.predict(domain)
    results.append(result)

Model Statistics

info = detector.get_model_info()
print(f"Total predictions: {info['usage_statistics']['total_predictions']}")
print(f"Safelist hits: {info['usage_statistics']['safelist_hits']}")
print(f"Typosquatting detections: {info['usage_statistics']['typosquatting_detections']}")

CLI Reference

Commands

dns-detect predict <domain> Predict a single domain

--json: Output as JSON
--no-safelist: Disable safelist checking

dns-detect batch <file> Batch process domains from file (one domain per line)

--output <file>: Output file path (default: results.json)
--no-safelist: Disable safelist checking

dns-detect info Show model information and statistics

--no-safelist: Show info without loading safelist

dns-detect test Run built-in self-tests

Examples

Detecting Typosquatting

detector = DNS_ThreatDetector()
detector.load_models()

# Legitimate brand
result = detector.predict('google.com')
# → BENIGN (brand_whitelist)

# Typosquatting attempts
result = detector.predict('gooogle.com')  # Extra 'o'
# → MALICIOUS (typosquatting_rule, dist=1)

result = detector.predict('g00gle.com')  # Digit substitution
# → MALICIOUS (typosquatting_rule, dist=2)

Processing Domain Lists

import pandas as pd

detector = DNS_ThreatDetector(use_safelist=True)
detector.load_models()

# Read domains from CSV
df = pd.read_csv('domains.csv')

# Add predictions
df['prediction'] = df['domain'].apply(
    lambda d: detector.predict(d)['prediction']
)
df['confidence'] = df['domain'].apply(
    lambda d: detector.predict(d)['confidence']
)

# Filter malicious domains
malicious = df[df['prediction'] == 'MALICIOUS']
print(malicious)

Requirements

Python ≥ 3.8
PyTorch ≥ 2.0.0
LightGBM ≥ 4.0.0
scikit-learn ≥ 1.3.0
pandas ≥ 2.0.0
numpy ≥ 1.24.0

Model Size

Total package size: ~60 MB
LightGBM models: ~10 MB
LSTM model: ~5 MB
Safelist files (tiers 1-3): ~20 MB
Tokenizer: ~1 MB

Dataset

Trained on 51,000 domains:

50% benign (legitimate domains)
50% malicious (DGA, typosquatting, malware C&C)
80/20 train/test split with stratification

Citation

If you use this tool in your research or project, please cite:

@software{dns_threat_detector,
  title = {DNS Threat Detector},
  author = {UMUDGA Project},
  year = {2025},
  version = {1.0.0},
  url = {https://github.com/umudga/dns-threat-detector}
}

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

Support

GitHub Issues: https://github.com/umudga/dns-threat-detector/issues
Documentation: https://github.com/umudga/dns-threat-detector/wiki

Changelog

Version 1.0.0 (2025-10-30)

Initial release
Hybrid ensemble architecture (LightGBM + LSTM + Meta-learner)
99.68% F1-score on test data
100% typosquatting detection
Multi-tier safelist integration
CLI tool with batch processing
Comprehensive API documentation

Acknowledgments

Developed by the UMUDGA Project team as part of a final-year academic research project on DNS threat detection using machine learning.

Disclaimer

This tool is provided for educational and research purposes. While it achieves high accuracy, no detection system is perfect. Always use multiple layers of security in production environments.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.8

Nov 14, 2025

1.0.7

Nov 13, 2025

1.0.6

Nov 6, 2025

1.0.5

Nov 6, 2025

1.0.4

Nov 5, 2025

1.0.3

Nov 5, 2025

1.0.2

Oct 31, 2025

1.0.1

Oct 31, 2025

This version

1.0.0

Oct 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dns_threat_detector-1.0.0.tar.gz (1.8 MB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dns_threat_detector-1.0.0-py3-none-any.whl (1.8 MB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file dns_threat_detector-1.0.0.tar.gz.

File metadata

Download URL: dns_threat_detector-1.0.0.tar.gz
Upload date: Oct 30, 2025
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dns_threat_detector-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2b86c5e7d224b5d6d9ead3e44cfdc066db9a315d8550e976a80276ce50cab302`
MD5	`146b318f6350f9e26cfe1d5307b54568`
BLAKE2b-256	`7ef94dab76fb385245789f99923c2a816a0c84c783a80f1dbd9cad5d42167ecf`

See more details on using hashes here.

File details

Details for the file dns_threat_detector-1.0.0-py3-none-any.whl.

File metadata

Download URL: dns_threat_detector-1.0.0-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 1.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dns_threat_detector-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a5309f2306780702ab13f3e2441d1d0e6d89c54b1eb266c55ef0b153095e72c`
MD5	`f77946c6cbe5abecb2331fec815c62b5`
BLAKE2b-256	`715e98340afc891ab588a409a97c322ea0816bb14078c4b594fd085b078b0836`

See more details on using hashes here.

dns-threat-detector 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DNS Threat Detector

Features

Performance Metrics

Installation

Quick Start

Python API

Command-Line Interface

Architecture

1. LightGBM Classifier

2. Bidirectional LSTM

3. Meta-Learner

4. Typosquatting Detector

5. Multi-Tier Safelist (Optional)

Prediction Flow

Features Extracted

FQDN Features (4)

Typosquatting Features (7)

API Reference

DNS_ThreatDetector

Methods

Advanced Usage

Custom Model Paths

Disable Safelist

Batch Processing with Progress

Model Statistics

CLI Reference

Commands

Examples

Detecting Typosquatting

Processing Domain Lists

Requirements

Model Size

Dataset

Citation

License

Contributing

Support

Changelog

Version 1.0.0 (2025-10-30)

Acknowledgments

Disclaimer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes