A package for intrusion detection using chain of thought and knowledge graphs
Project description
CoTKG-IDS: Chain of Thought Knowledge Graph Intrusion Detection System
Overview
CoTKG-IDS is an advanced network intrusion detection system that combines Chain of Thought (CoT) reasoning with knowledge graphs and GraphSAGE for enhanced detection capabilities and interpretability. The system is designed to detect and classify various types of network intrusions by leveraging graph-based deep learning and knowledge representation.
Key Features
- ๐ง Chain of Thought (CoT) enhanced reasoning for interpretable detection
- ๐ธ๏ธ Dynamic knowledge graph construction from network flow data
- ๐ GraphSAGE-based network analysis for pattern detection
- ๐ Advanced feature engineering with automated selection
- โ๏ธ Intelligent data balancing for handling imbalanced attack classes
- ๐ฏ Multi-class attack detection with high accuracy
- ๐ Comprehensive visualization tools for analysis
- ๐ Real-time processing capabilities
- ๐ ๏ธ Modular architecture for easy extension
- ๐ค Support for multiple LLM providers (Ollama, Qianwen)
Architecture
Input Data โ Feature Engineering โ Knowledge Graph Construction โ GraphSAGE Model โ Attack Detection
โ โ โ โ โ
Preprocessing โ Feature Selection โ Graph Embeddings โ Chain of Thought โ Interpretability
Installation
Prerequisites
- Python 3.7+
- Neo4j Database 4.4+
- PyTorch 1.9+
- CUDA (optional, for GPU support)
- 8GB+ RAM recommended
- 50GB+ disk space for full dataset
- Ollama (for local LLM support)
- Qianwen API key (optional)
Environment Setup
- Clone the repository:
git clone https://github.com/chenxingqiang/cotkg-ids.git
cd cotkg-ids
- Create and activate virtual environment:
# Using venv
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Or using conda
conda create -n cotkg-ids python=3.9
conda activate cotkg-ids
- Install dependencies:
pip install -r requirements.txt
LLM Setup
You can use either Ollama (local) or Qianwen (cloud) as your LLM provider.
Ollama Setup
macOS Installation
- Install using Homebrew:
brew install ollama
- Start Ollama service:
# Start the service
brew services start ollama
# If you encounter port conflicts, try:
brew services stop ollama
pkill ollama
brew services start ollama
Linux Installation
- Install using curl:
curl https://ollama.ai/install.sh | sh
- Start Ollama service:
# Start as a service (systemd)
sudo systemctl start ollama
# Or run directly
ollama serve
# If you encounter port conflicts:
sudo systemctl stop ollama
pkill ollama
sudo systemctl start ollama
Windows Installation
- Download the installer from Ollama's website
- Run the installer
- Start Ollama:
# Using Command Prompt as Administrator
ollama.exe serve
# If you encounter port conflicts:
# First, find the process using port 11434
netstat -ano | findstr :11434
# Kill the process using its PID
taskkill /F /PID <PID>
# Then start Ollama again
ollama.exe serve
Verify Installation
Check if Ollama is running properly:
# Check server status
curl http://localhost:11434/api/health
# Or use the Python script
python scripts/setup_ollama.py --action list
Setup Models
After installing Ollama, you need to pull the required models:
# List available models
python scripts/setup_ollama.py --action list
# Pull all default models
python scripts/setup_ollama.py --action setup
# Pull specific models
python scripts/setup_ollama.py --action pull --models llama2 codellama
# Remove models
python scripts/setup_ollama.py --action delete --models model_name
Troubleshooting Ollama
If you encounter issues:
- Port conflicts (Error: address already in use):
# macOS/Linux
sudo lsof -i :11434
pkill ollama
brew services restart ollama # macOS
sudo systemctl restart ollama # Linux
# Windows
netstat -ano | findstr :11434
taskkill /F /PID <PID>
- Connection issues:
# Check if server is running
curl http://localhost:11434/api/health
# Check logs
# macOS
brew services info ollama
# Linux
journalctl -u ollama
# Windows
# Check Windows Event Viewer
- Memory issues:
- Ensure you have at least 8GB RAM available
- Close other memory-intensive applications
- For Windows, increase page file size
- For Linux, adjust swap space
- Model download issues:
# Try with debug logging
python scripts/setup_ollama.py --action pull --models llama2 --debug
# Check network connectivity
curl -v http://localhost:11434
Qianwen Setup
- Get your API key from Qianwen
- Set your API key as an environment variable:
export QIANWEN_API_KEY='your-api-key'
Configure LLM Provider
Update config/config.py to choose your LLM provider:
COT_CONFIG = {
'provider': 'ollama', # or 'qianwen'
'model': 'llama2', # for ollama
'ollama': {
'base_url': 'http://localhost:11434',
'timeout': 30,
'models': ['llama2', 'mistral', 'codellama', 'vicuna']
},
'qianwen': {
'model': 'qwen-max',
'max_tokens': 1500,
'temperature': 0.85
}
}
Neo4j Setup
- Install Neo4j:
# Using Docker (recommended)
docker run \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
-d neo4j:4.4
# Or download from neo4j.com
- Configure Neo4j:
- Open http://localhost:7474
- Login with default credentials (neo4j/neo4j)
- Change password when prompted
- Update config/config.py with your credentials
Data Preparation
- Download the dataset:
python download_data.py
- Verify data integrity:
python test_pipeline.py --mode data_check
Usage
Quick Start
# Run complete pipeline with default settings
python run.py
# Run only training
python run.py --mode train
# Run only testing
python run.py --mode test
# Run with test configuration
python run.py --test
Python API Usage
from src.main import run_full_pipeline
from config.config import DEFAULT_CONFIG
# Use default configuration
results = run_full_pipeline()
# Or customize configuration
config = DEFAULT_CONFIG.copy()
config['model']['graphsage'].update({
'hidden_channels': 64,
'num_layers': 3,
'dropout': 0.2
})
results = run_full_pipeline(config=config)
Configuration
The system can be configured through config/config.py. Key configuration sections:
DEFAULT_CONFIG = {
'model': {
'graphsage': {
'hidden_channels': 32,
'num_layers': 2,
'dropout': 0.3,
'learning_rate': 0.01,
'weight_decay': 0.0005
}
},
'training': {
'epochs': 20,
'batch_size': 16,
'early_stopping': {
'patience': 5,
'min_delta': 0.01
},
'validation_split': 0.2
},
'data': {
'balancing': {
'method': 'smote',
'random_state': 42
}
},
'neo4j': {
'uri': 'bolt://localhost:7687',
'username': 'neo4j',
'password': 'password'
},
'cot': {
'provider': 'ollama',
'model': 'llama2',
'ollama': {
'base_url': 'http://localhost:11434',
'timeout': 30
}
}
}
Project Structure
cotkg-ids/
โโโ config/ # Configuration files
โโโ data/ # Data storage
โ โโโ raw/ # Raw dataset
โ โโโ processed/ # Processed data
โโโ logs/ # Log files
โโโ notebooks/ # Jupyter notebooks
โโโ results/ # Output files
โ โโโ models/ # Saved models
โ โโโ visualizations/ # Generated plots
โโโ scripts/ # Utility scripts
โ โโโ setup_ollama.py # Ollama setup script
โโโ src/ # Source code
โ โโโ data_processing/ # Data processing modules
โ โโโ knowledge_graph/ # KG related code
โ โโโ models/ # ML models
โ โโโ visualization/ # Visualization tools
โโโ tests/ # Test files
Performance Metrics
The system is evaluated on multiple metrics:
- Detection Accuracy: ~98% on test set
- False Positive Rate: <1%
- Processing Speed: ~1000 flows/second
- Memory Usage: ~4GB for standard dataset
Troubleshooting
Common Issues
- Neo4j Connection:
# Check Neo4j status
docker ps | grep neo4j
# or
service neo4j status
- CUDA Issues:
import torch
print(torch.cuda.is_available()) # Should return True if CUDA is properly set up
- Ollama Issues:
# Check Ollama server status
curl http://localhost:11434/api/health
# List available models
python scripts/setup_ollama.py --action list
# Restart Ollama server
pkill ollama
ollama serve
- Memory Issues:
- Reduce batch_size in config
- Use data sampling for large datasets
- Enable swap space if needed
Error Messages
- "Neo4j connection failed": Check Neo4j credentials and service status
- "CUDA out of memory": Reduce batch size or model size
- "File not found": Ensure dataset is downloaded and in correct location
- "Ollama server not responding": Check if Ollama is running and accessible
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Development Guidelines
- Follow PEP 8 style guide
- Add tests for new features
- Update documentation
- Use type hints
- Add appropriate error handling
License
MIT License - see LICENSE
Citation
@software{cotkg_ids2024,
author = {Chen, Xingqiang},
title = {CoTKG-IDS: Chain of Thought Knowledge Graph IDS},
year = {2024},
publisher = {GitHub},
url = {https://github.com/chenxingqiang/cotkg-ids}
}
Contact
Chen Xingqiang
Email: chen.xingqiang@iechor.com
GitHub: @chenxingqiang
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cotkg_ids-0.2.1.tar.gz.
File metadata
- Download URL: cotkg_ids-0.2.1.tar.gz
- Upload date:
- Size: 38.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68628d292572985e2ebb4c633c6f43e625a39dbb85a83bbb6d0683b8155413e3
|
|
| MD5 |
2856988ed7e7262e5e4efd1adf98c7c4
|
|
| BLAKE2b-256 |
d3c15f1d074863c9c0b137129afdd470d5b5cf85b8b1243640c081d34299e216
|
File details
Details for the file cotkg_ids-0.2.1-py3-none-any.whl.
File metadata
- Download URL: cotkg_ids-0.2.1-py3-none-any.whl
- Upload date:
- Size: 40.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3736dfb6c8f8dae876cdf74a47ab08277073fdc996d2f6ad823ae6d9d8aaec22
|
|
| MD5 |
cba3f9e98f996f59abd4ea650b4f4ca7
|
|
| BLAKE2b-256 |
6cd3fefb4d0bf2b3c7929cf48d3d7fc8d0f9b0321872a7ac1dcae1012b8187e9
|