A network intrusion detection system using Chain of Thought, knowledge graphs and GraphSAGE

These details have not been verified by PyPI

Project links

Project description

CoTKG-IDS: Chain of Thought Knowledge Graph Intrusion Detection System

Overview

CoTKG-IDS is an advanced network intrusion detection system that combines Chain of Thought (CoT) reasoning with knowledge graphs and GraphSAGE for enhanced detection capabilities and interpretability. The system is designed to detect and classify various types of network intrusions by leveraging graph-based deep learning and knowledge representation.

Key Features

🧠 Chain of Thought (CoT) enhanced reasoning for interpretable detection
🕸️ Dynamic knowledge graph construction from network flow data
📊 GraphSAGE-based network analysis for pattern detection
🔍 Advanced feature engineering with automated selection
⚖️ Intelligent data balancing for handling imbalanced attack classes
🎯 Multi-class attack detection with high accuracy
📈 Comprehensive visualization tools for analysis
🔄 Real-time processing capabilities
🛠️ Modular architecture for easy extension
🤖 Support for multiple LLM providers (Ollama, Qianwen)

Architecture

Input Data → Feature Engineering → Knowledge Graph Construction → GraphSAGE Model → Attack Detection
    ↓               ↓                       ↓                         ↓                ↓
Preprocessing → Feature Selection → Graph Embeddings → Chain of Thought → Interpretability

Installation

Prerequisites

Python 3.7+
Neo4j Database 4.4+
PyTorch 1.9+
CUDA (optional, for GPU support)
8GB+ RAM recommended
50GB+ disk space for full dataset
Ollama (for local LLM support)
Qianwen API key (optional)

Environment Setup

Clone the repository:

git clone https://github.com/chenxingqiang/cotkg-ids.git
cd cotkg-ids

Create and activate virtual environment:

# Using venv
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Or using conda
conda create -n cotkg-ids python=3.9
conda activate cotkg-ids

Install dependencies:

pip install -r requirements.txt

LLM Setup

You can use either Ollama (local) or Qianwen (cloud) as your LLM provider.

Ollama Setup

macOS Installation

Install using Homebrew:

brew install ollama

Start Ollama service:

# Start the service
brew services start ollama

# If you encounter port conflicts, try:
brew services stop ollama
pkill ollama
brew services start ollama

Linux Installation

Install using curl:

curl https://ollama.ai/install.sh | sh

Start Ollama service:

# Start as a service (systemd)
sudo systemctl start ollama

# Or run directly
ollama serve

# If you encounter port conflicts:
sudo systemctl stop ollama
pkill ollama
sudo systemctl start ollama

Windows Installation

Download the installer from Ollama's website
Run the installer
Start Ollama:

# Using Command Prompt as Administrator
ollama.exe serve

# If you encounter port conflicts:
# First, find the process using port 11434
netstat -ano | findstr :11434
# Kill the process using its PID
taskkill /F /PID <PID>
# Then start Ollama again
ollama.exe serve

Verify Installation

Check if Ollama is running properly:

# Check server status
curl http://localhost:11434/api/health

# Or use the Python script
python scripts/setup_ollama.py --action list

Setup Models

After installing Ollama, you need to pull the required models:

# List available models
python scripts/setup_ollama.py --action list

# Pull all default models
python scripts/setup_ollama.py --action setup

# Pull specific models
python scripts/setup_ollama.py --action pull --models llama2 codellama

# Remove models
python scripts/setup_ollama.py --action delete --models model_name

Troubleshooting Ollama

If you encounter issues:

Port conflicts (Error: address already in use):

# macOS/Linux
sudo lsof -i :11434
pkill ollama
brew services restart ollama  # macOS
sudo systemctl restart ollama  # Linux

# Windows
netstat -ano | findstr :11434
taskkill /F /PID <PID>

Connection issues:

# Check if server is running
curl http://localhost:11434/api/health

# Check logs
# macOS
brew services info ollama
# Linux
journalctl -u ollama
# Windows
# Check Windows Event Viewer

Memory issues:

Ensure you have at least 8GB RAM available
Close other memory-intensive applications
For Windows, increase page file size
For Linux, adjust swap space

Model download issues:

# Try with debug logging
python scripts/setup_ollama.py --action pull --models llama2 --debug

# Check network connectivity
curl -v http://localhost:11434

Qianwen Setup

Get your API key from Qianwen
Set your API key as an environment variable:

export QIANWEN_API_KEY='your-api-key'

Configure LLM Provider

Update config/config.py to choose your LLM provider:

COT_CONFIG = {
    'provider': 'ollama',  # or 'qianwen'
    'model': 'llama2',     # for ollama
    'ollama': {
        'base_url': 'http://localhost:11434',
        'timeout': 30,
        'models': ['llama2', 'mistral', 'codellama', 'vicuna']
    },
    'qianwen': {
        'model': 'qwen-max',
        'max_tokens': 1500,
        'temperature': 0.85
    }
}

Neo4j Setup

Install Neo4j:

# Using Docker (recommended)
docker run \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    -d neo4j:4.4

# Or download from neo4j.com

Configure Neo4j:

Open http://localhost:7474
Login with default credentials (neo4j/neo4j)
Change password when prompted
Update config/config.py with your credentials

Data Preparation

Download the dataset:

python download_data.py

Verify data integrity:

python test_pipeline.py --mode data_check

Usage

Quick Start

# Run complete pipeline with default settings
python run.py

# Run only training
python run.py --mode train

# Run only testing
python run.py --mode test

# Run with test configuration
python run.py --test

Python API Usage

from src.main import run_full_pipeline
from config.config import DEFAULT_CONFIG

# Use default configuration
results = run_full_pipeline()

# Or customize configuration
config = DEFAULT_CONFIG.copy()
config['model']['graphsage'].update({
    'hidden_channels': 64,
    'num_layers': 3,
    'dropout': 0.2
})
results = run_full_pipeline(config=config)

Configuration

The system can be configured through config/config.py. Key configuration sections:

DEFAULT_CONFIG = {
    'model': {
        'graphsage': {
            'hidden_channels': 32,
            'num_layers': 2,
            'dropout': 0.3,
            'learning_rate': 0.01,
            'weight_decay': 0.0005
        }
    },
    'training': {
        'epochs': 20,
        'batch_size': 16,
        'early_stopping': {
            'patience': 5,
            'min_delta': 0.01
        },
        'validation_split': 0.2
    },
    'data': {
        'balancing': {
            'method': 'smote',
            'random_state': 42
        }
    },
    'neo4j': {
        'uri': 'bolt://localhost:7687',
        'username': 'neo4j',
        'password': 'password'
    },
    'cot': {
        'provider': 'ollama',
        'model': 'llama2',
        'ollama': {
            'base_url': 'http://localhost:11434',
            'timeout': 30
        }
    }
}

Project Structure

cotkg-ids/
├── config/                 # Configuration files
├── data/                  # Data storage
│   ├── raw/              # Raw dataset
│   └── processed/        # Processed data
├── logs/                 # Log files
├── notebooks/            # Jupyter notebooks
├── results/              # Output files
│   ├── models/          # Saved models
│   └── visualizations/  # Generated plots
├── scripts/             # Utility scripts
│   └── setup_ollama.py  # Ollama setup script
├── src/                 # Source code
│   ├── data_processing/ # Data processing modules
│   ├── knowledge_graph/ # KG related code
│   ├── models/         # ML models
│   └── visualization/  # Visualization tools
└── tests/              # Test files

Performance Metrics

The system is evaluated on multiple metrics:

Detection Accuracy: ~98% on test set
False Positive Rate: <1%
Processing Speed: ~1000 flows/second
Memory Usage: ~4GB for standard dataset

Troubleshooting

Common Issues

Neo4j Connection:

# Check Neo4j status
docker ps | grep neo4j
# or
service neo4j status

CUDA Issues:

import torch
print(torch.cuda.is_available())  # Should return True if CUDA is properly set up

Ollama Issues:

# Check Ollama server status
curl http://localhost:11434/api/health

# List available models
python scripts/setup_ollama.py --action list

# Restart Ollama server
pkill ollama
ollama serve

Memory Issues:

Reduce batch_size in config
Use data sampling for large datasets
Enable swap space if needed

Error Messages

"Neo4j connection failed": Check Neo4j credentials and service status
"CUDA out of memory": Reduce batch size or model size
"File not found": Ensure dataset is downloaded and in correct location
"Ollama server not responding": Check if Ollama is running and accessible

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guide
Add tests for new features
Update documentation
Use type hints
Add appropriate error handling

License

MIT License - see LICENSE

Citation

@software{cotkg_ids2024,
  author = {Chen, Xingqiang},
  title = {CoTKG-IDS: Chain of Thought Knowledge Graph IDS},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/chenxingqiang/cotkg-ids}
}

Contact

Chen Xingqiang
Email: chen.xingqiang@iechor.com
GitHub: @chenxingqiang

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Feb 26, 2025

This version

0.2.2

Feb 26, 2025

0.2.1

Feb 24, 2025

0.2.0

Feb 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cotkg_ids-0.2.2-py3-none-any.whl (43.0 kB view details)

Uploaded Feb 26, 2025 Python 3

File details

Details for the file cotkg_ids-0.2.2-py3-none-any.whl.

File metadata

Download URL: cotkg_ids-0.2.2-py3-none-any.whl
Upload date: Feb 26, 2025
Size: 43.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for cotkg_ids-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e52a47ceea41e31d8ace8ade4e00122dd646c9514a30cbbd1cdb67218c91afba`
MD5	`f72e27d4fa56f705d55b9838c570ebfa`
BLAKE2b-256	`83248f4dd8b9ce5cb9143f7d0278b4ef303937b9a7f0c57bbfca1e47d44a35ab`

See more details on using hashes here.

cotkg-ids 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CoTKG-IDS: Chain of Thought Knowledge Graph Intrusion Detection System

Overview

Key Features

Architecture

Installation

Prerequisites

Environment Setup

LLM Setup

Ollama Setup

macOS Installation

Linux Installation

Windows Installation

Verify Installation

Setup Models

Troubleshooting Ollama

Qianwen Setup

Configure LLM Provider

Neo4j Setup

Data Preparation

Usage

Quick Start

Python API Usage

Configuration

Project Structure

Performance Metrics

Troubleshooting

Common Issues

Error Messages

Contributing

Development Guidelines

License

Citation

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes