CTINexus: A framework for data-efficient cyber threat intelligence extraction and knowledge graph construction using LLMs.
Project description
Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models
CTINexus is a framework that leverages optimized in-context learning (ICL) of large language models (LLMs) for automatic cyber threat intelligence (CTI) knowledge extraction and cybersecurity knowledge graph (CSKG) construction. CTINexus adapts to various cybersecurity ontologies with minimal annotated examples and provides a user-friendly web interface for instant threat intelligence analysis.
What CTINexus Does
The framework automatically processes unstructured threat intelligence reports to:
- Extract cybersecurity entities (malware, vulnerabilities, tactics, IOCs)
- Identify relationships between security concepts
- Construct knowledge graphs with interactive visualizations
- Require minimal configuration - no extensive data or parameter tuning needed
Core Components
- Intelligence Extraction (IE): Automatically extracts cybersecurity entities and relationships from unstructured text using optimized prompt construction and demonstration retrieval
- Hierarchical Entity Alignment: Canonicalizes extracted knowledge and removes redundancy through:
- Entity Typing (ET): Groups mentions of the same semantic type
- Entity Merging (EM): Merges mentions referring to the same entity with IOC (Indicator of Compromise) protection
- Link Prediction (LP): Predicts and adds missing relationships to complete the knowledge graph
- Graph Visualization: Interactive network visualization of the constructed cybersecurity knowledge graph
News
📦 [2025/09/03] CTINexus Python package released! Install with pip install ctinexus for seamless integration into your Python projects.
🌟 [2025/07/29] CTINexus now features an intuitive Gradio interface! Submit threat intelligence text and instantly visualize extracted interactive graphs.
🔥 [2025/04/21] We released the camera-ready paper on arxiv.
🔥 [2025/02/12] CTINexus is accepted at 2025 IEEE European Symposium on Security and Privacy (Euro S&P).
Quick Start
You can use CTINexus in three ways:
- 📦 Python Package: Python package for easy integration
- ⚡ Command Line: For automation and batch processing → 📖 CLI Guide
- 🖥️ Web Interface: User-friendly GUI for interactive analysis (follow the setup below)
Supported Models
CTINexus supports the following AI providers: OpenAI, Gemini, AWS, Ollama
All models from these providers are supported. If you would like to see additional providers integrated, please open a feature request issue here.
📦 Using as a Python Package
CTINexus can be used as a Python library for seamless integration into your projects.
Installation
pip install ctinexus
Configuration
Before using CTINexus, you need to configure API keys. Create a .env file in your project directory with your credentials. Look at the example env for reference.
Usage
from ctinexus import process_cti_report
from dotenv import load_dotenv
load_dotenv()
# Example usage
text = "Your CTI text here"
result = process_cti_report(
text=text,
provider="openai", # optional: auto-detected if not specified
model="gpt-4", # optional: uses default if not specified
similarity_threshold=0.6,
output="results.json" # optional: save results to file
)
# Access results
print(f"Graph:", result["entity_relation_graph"])
# Outputs the html file with the graph visualization.
# Open the html file on your browser to see the results.
Parameters
text(str): The threat intelligence report text to processprovider(str, optional): AI provider ("openai", "gemini", "aws", "ollama"). Auto-detected from available keys if not specifiedmodel(str, optional): Specific model name (e.g., "gpt-4", "gemini-pro")embedding_model(str, optional): Model for embeddingsie_model,et_model,ea_model,lp_model(str, optional): Specific models for each pipeline componentsimilarity_threshold(float, default 0.6): Threshold for entity similarity matchingoutput(str, optional): File path to save JSON results
Return Value
Returns a dictionary containing the complete CTI analysis results:
text: The original input textIE: Intelligence Extraction results with:triplets: Raw extracted subject-relation-object triplets
ET: Entity Typing results with:typed_triplets: Triplets with entity type classifications (Malware, Vulnerability, Infrastructure, etc.)
EA: Entity Alignment results with:aligned_triplets: Triplets with merged entities and canonical entity IDs
LP: Link Prediction results with:predicted_links: Additional predicted relationships between entities
entity_relation_graph: File path to the interactive HTML visualization
🐍 Local Development Setup
For users who want to run the web interface, use the command line interface, or contribute to the project, you'll need to clone the repository and set up the development environment.
Prerequisites
- API Key from one of the supported providers: OpenAI, Gemini, AWS, or Ollama (local, free)
- Python 3.11+ and pip
Step 1: Clone the Repository
git clone https://github.com/peng-gao-lab/CTINexus.git
cd CTINexus
Step 2: Configure API Keys
Create a .env file in the project root:
cp .env.example .env
Edit the .env file with your API credentials.
Note: You only need to set up one provider. If using Ollama, see the Ollama Guide.
Step 3: Setup Python Environment
# Create virtual environment
python -m venv .venv
# Activate virtual environment (macOS/Linux:)
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate
# Install package
pip install -e .
Step 4: Run the Application
ctinexus
Step 5: Access the Application
Open your browser and navigate to: http://127.0.0.1:7860
Use Ctrl+C in the terminal to stop the application.
🐳 Docker Setup
For containerized deployment or development:
Prerequisites
- Docker and Docker Compose installed
- API keys configured (see Local Development Setup above)
Step 1: Clone the Repository
git clone https://github.com/peng-gao-lab/CTINexus.git
cd CTINexus
Step 2: Configure API Keys
Create a .env file as described in the Local Development Setup section.
Step 3: Launch with Docker
# Build and start
docker compose up --build
# Or run in detached mode (runs in background)
docker compose up -d --build
Step 4: Access the Application
Open your browser and navigate to: http://localhost:8000
Step 5: Stop the Application
docker compose down
Web Interface and CLI Usage
After setting up the local environment, you can use CTINexus through the web interface or command line.
⚡ Command Line Interface (CLI)
For automation and batch processing:
ctinexus --input-file report.txt
📖 Complete CLI Documentation - Detailed usage examples and options.
🖥️ Web Interface (GUI)
Once the application is running:
-
Open your browser to the appropriate URL:
- Docker:
http://localhost:8000 - Local:
http://127.0.0.1:7860
- Docker:
-
Paste threat intelligence text into the input area
-
Select your preferred AI model from the dropdown
-
Click "Run" to analyze the text
-
View results:
- Extracted Entities: Identified cybersecurity entities
- Relationships: Discovered connections between entities
- Interactive Graph: Network visualization
- Export Options: Download results as JSON or images
Contributing
We warmly welcome contributions from the community! Whether you're interested in:
- 🐛 Fixing bugs or adding new features
- 📖 Improving documentation or adding examples
- 🎨 UI/UX enhancements for the web interface
Please check out our Contributing Guide for detailed information on how to get started, development setup, and submission guidelines.
Citation
@inproceedings{cheng2025ctinexusautomaticcyberthreat,
title={CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models},
author={Yutong Cheng and Osama Bajaber and Saimon Amanuel Tsegai and Dawn Song and Peng Gao},
booktitle={2025 IEEE European Symposium on Security and Privacy (EuroS\&P)},
year={2025},
organization={IEEE}
}
License
The source code is licensed under the MIT License. We warmly welcome industry collaboration. If you’re interested in building on CTINexus or exploring joint initiatives, please email yutongcheng@vt.edu or saimon.tsegai@vt.edu, we’d be happy to set up a brief call to discuss ideas.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ctinexus-0.1.0.tar.gz.
File metadata
- Download URL: ctinexus-0.1.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da2eb03a2b8689c09c460537e3d22cf28a1b776daf5ade5dc78befce93a0888c
|
|
| MD5 |
cae76fc073e372955651da2ca3a56a15
|
|
| BLAKE2b-256 |
b2270d2b65ca30287b44f477a96ccf52b663671fbaa64818a5216a56075117a1
|
File details
Details for the file ctinexus-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ctinexus-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf3be8a17424f263fedf593fd853dd32f8ae234440ca8a2ec4c42e094ee03d24
|
|
| MD5 |
276f02978f61dcbe188569b73afcf5f9
|
|
| BLAKE2b-256 |
efa71844fbd5bb667169ab9fcef2f6a32c8ce706874f761e9e9583410a5494ee
|