Skip to main content

A network intrusion detection system using Chain of Thought, knowledge graphs and GraphSAGE

Project description

CoTKG-IDS: Chain of Thought Knowledge Graph Intrusion Detection System

Overview

CoTKG-IDS is an advanced network intrusion detection system that combines Chain of Thought (CoT) reasoning with knowledge graphs and GraphSAGE for enhanced detection capabilities and interpretability. The system is designed to detect and classify various types of network intrusions by leveraging graph-based deep learning and knowledge representation.

Key Features

  • ๐Ÿง  Chain of Thought (CoT) enhanced reasoning for interpretable detection
  • ๐Ÿ•ธ๏ธ Dynamic knowledge graph construction from network flow data
  • ๐Ÿ“Š GraphSAGE-based network analysis for pattern detection
  • ๐Ÿ” Advanced feature engineering with automated selection
  • โš–๏ธ Intelligent data balancing for handling imbalanced attack classes
  • ๐ŸŽฏ Multi-class attack detection with high accuracy
  • ๐Ÿ“ˆ Comprehensive visualization tools for analysis
  • ๐Ÿ”„ Real-time processing capabilities
  • ๐Ÿ› ๏ธ Modular architecture for easy extension
  • ๐Ÿค– Support for multiple LLM providers (Ollama, Qianwen)

Architecture

Input Data โ†’ Feature Engineering โ†’ Knowledge Graph Construction โ†’ GraphSAGE Model โ†’ Attack Detection
    โ†“               โ†“                       โ†“                         โ†“                โ†“
Preprocessing โ†’ Feature Selection โ†’ Graph Embeddings โ†’ Chain of Thought โ†’ Interpretability

Installation

Prerequisites

  • Python 3.7+
  • Neo4j Database 4.4+
  • PyTorch 1.9+
  • CUDA (optional, for GPU support)
  • 8GB+ RAM recommended
  • 50GB+ disk space for full dataset
  • Ollama (for local LLM support)
  • Qianwen API key (optional)

Environment Setup

  1. Clone the repository:
git clone https://github.com/chenxingqiang/cotkg-ids.git
cd cotkg-ids
  1. Create and activate virtual environment:
# Using venv
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Or using conda
conda create -n cotkg-ids python=3.9
conda activate cotkg-ids
  1. Install dependencies:
pip install -r requirements.txt

LLM Setup

You can use either Ollama (local) or Qianwen (cloud) as your LLM provider.

Ollama Setup

macOS Installation
  1. Install using Homebrew:
brew install ollama
  1. Start Ollama service:
# Start the service
brew services start ollama

# If you encounter port conflicts, try:
brew services stop ollama
pkill ollama
brew services start ollama
Linux Installation
  1. Install using curl:
curl https://ollama.ai/install.sh | sh
  1. Start Ollama service:
# Start as a service (systemd)
sudo systemctl start ollama

# Or run directly
ollama serve

# If you encounter port conflicts:
sudo systemctl stop ollama
pkill ollama
sudo systemctl start ollama
Windows Installation
  1. Download the installer from Ollama's website
  2. Run the installer
  3. Start Ollama:
# Using Command Prompt as Administrator
ollama.exe serve

# If you encounter port conflicts:
# First, find the process using port 11434
netstat -ano | findstr :11434
# Kill the process using its PID
taskkill /F /PID <PID>
# Then start Ollama again
ollama.exe serve
Verify Installation

Check if Ollama is running properly:

# Check server status
curl http://localhost:11434/api/health

# Or use the Python script
python scripts/setup_ollama.py --action list
Setup Models

After installing Ollama, you need to pull the required models:

# List available models
python scripts/setup_ollama.py --action list

# Pull all default models
python scripts/setup_ollama.py --action setup

# Pull specific models
python scripts/setup_ollama.py --action pull --models llama2 codellama

# Remove models
python scripts/setup_ollama.py --action delete --models model_name
Troubleshooting Ollama

If you encounter issues:

  1. Port conflicts (Error: address already in use):
# macOS/Linux
sudo lsof -i :11434
pkill ollama
brew services restart ollama  # macOS
sudo systemctl restart ollama  # Linux

# Windows
netstat -ano | findstr :11434
taskkill /F /PID <PID>
  1. Connection issues:
# Check if server is running
curl http://localhost:11434/api/health

# Check logs
# macOS
brew services info ollama
# Linux
journalctl -u ollama
# Windows
# Check Windows Event Viewer
  1. Memory issues:
  • Ensure you have at least 8GB RAM available
  • Close other memory-intensive applications
  • For Windows, increase page file size
  • For Linux, adjust swap space
  1. Model download issues:
# Try with debug logging
python scripts/setup_ollama.py --action pull --models llama2 --debug

# Check network connectivity
curl -v http://localhost:11434

Qianwen Setup

  1. Get your API key from Qianwen
  2. Set your API key as an environment variable:
export QIANWEN_API_KEY='your-api-key'

Configure LLM Provider

Update config/config.py to choose your LLM provider:

COT_CONFIG = {
    'provider': 'ollama',  # or 'qianwen'
    'model': 'llama2',     # for ollama
    'ollama': {
        'base_url': 'http://localhost:11434',
        'timeout': 30,
        'models': ['llama2', 'mistral', 'codellama', 'vicuna']
    },
    'qianwen': {
        'model': 'qwen-max',
        'max_tokens': 1500,
        'temperature': 0.85
    }
}

Neo4j Setup

  1. Install Neo4j:
# Using Docker (recommended)
docker run \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    -d neo4j:4.4

# Or download from neo4j.com
  1. Configure Neo4j:
  • Open http://localhost:7474
  • Login with default credentials (neo4j/neo4j)
  • Change password when prompted
  • Update config/config.py with your credentials

Data Preparation

  1. Download the dataset:
python download_data.py
  1. Verify data integrity:
python test_pipeline.py --mode data_check

Usage

Quick Start

# Run complete pipeline with default settings
python run.py

# Run only training
python run.py --mode train

# Run only testing
python run.py --mode test

# Run with test configuration
python run.py --test

Python API Usage

from src.main import run_full_pipeline
from config.config import DEFAULT_CONFIG

# Use default configuration
results = run_full_pipeline()

# Or customize configuration
config = DEFAULT_CONFIG.copy()
config['model']['graphsage'].update({
    'hidden_channels': 64,
    'num_layers': 3,
    'dropout': 0.2
})
results = run_full_pipeline(config=config)

Configuration

The system can be configured through config/config.py. Key configuration sections:

DEFAULT_CONFIG = {
    'model': {
        'graphsage': {
            'hidden_channels': 32,
            'num_layers': 2,
            'dropout': 0.3,
            'learning_rate': 0.01,
            'weight_decay': 0.0005
        }
    },
    'training': {
        'epochs': 20,
        'batch_size': 16,
        'early_stopping': {
            'patience': 5,
            'min_delta': 0.01
        },
        'validation_split': 0.2
    },
    'data': {
        'balancing': {
            'method': 'smote',
            'random_state': 42
        }
    },
    'neo4j': {
        'uri': 'bolt://localhost:7687',
        'username': 'neo4j',
        'password': 'password'
    },
    'cot': {
        'provider': 'ollama',
        'model': 'llama2',
        'ollama': {
            'base_url': 'http://localhost:11434',
            'timeout': 30
        }
    }
}

Project Structure

cotkg-ids/
โ”œโ”€โ”€ config/                 # Configuration files
โ”œโ”€โ”€ data/                  # Data storage
โ”‚   โ”œโ”€โ”€ raw/              # Raw dataset
โ”‚   โ””โ”€โ”€ processed/        # Processed data
โ”œโ”€โ”€ logs/                 # Log files
โ”œโ”€โ”€ notebooks/            # Jupyter notebooks
โ”œโ”€โ”€ results/              # Output files
โ”‚   โ”œโ”€โ”€ models/          # Saved models
โ”‚   โ””โ”€โ”€ visualizations/  # Generated plots
โ”œโ”€โ”€ scripts/             # Utility scripts
โ”‚   โ””โ”€โ”€ setup_ollama.py  # Ollama setup script
โ”œโ”€โ”€ src/                 # Source code
โ”‚   โ”œโ”€โ”€ data_processing/ # Data processing modules
โ”‚   โ”œโ”€โ”€ knowledge_graph/ # KG related code
โ”‚   โ”œโ”€โ”€ models/         # ML models
โ”‚   โ””โ”€โ”€ visualization/  # Visualization tools
โ””โ”€โ”€ tests/              # Test files

Performance Metrics

The system is evaluated on multiple metrics:

  • Detection Accuracy: ~98% on test set
  • False Positive Rate: <1%
  • Processing Speed: ~1000 flows/second
  • Memory Usage: ~4GB for standard dataset

Troubleshooting

Common Issues

  1. Neo4j Connection:
# Check Neo4j status
docker ps | grep neo4j
# or
service neo4j status
  1. CUDA Issues:
import torch
print(torch.cuda.is_available())  # Should return True if CUDA is properly set up
  1. Ollama Issues:
# Check Ollama server status
curl http://localhost:11434/api/health

# List available models
python scripts/setup_ollama.py --action list

# Restart Ollama server
pkill ollama
ollama serve
  1. Memory Issues:
  • Reduce batch_size in config
  • Use data sampling for large datasets
  • Enable swap space if needed

Error Messages

  • "Neo4j connection failed": Check Neo4j credentials and service status
  • "CUDA out of memory": Reduce batch size or model size
  • "File not found": Ensure dataset is downloaded and in correct location
  • "Ollama server not responding": Check if Ollama is running and accessible

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guide
  • Add tests for new features
  • Update documentation
  • Use type hints
  • Add appropriate error handling

License

MIT License - see LICENSE

Citation

@software{cotkg_ids2024,
  author = {Chen, Xingqiang},
  title = {CoTKG-IDS: Chain of Thought Knowledge Graph IDS},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/chenxingqiang/cotkg-ids}
}

Contact

Chen Xingqiang
Email: chen.xingqiang@iechor.com
GitHub: @chenxingqiang

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cotkg_ids-0.2.2-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file cotkg_ids-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: cotkg_ids-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for cotkg_ids-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e52a47ceea41e31d8ace8ade4e00122dd646c9514a30cbbd1cdb67218c91afba
MD5 f72e27d4fa56f705d55b9838c570ebfa
BLAKE2b-256 83248f4dd8b9ce5cb9143f7d0278b4ef303937b9a7f0c57bbfca1e47d44a35ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page