Text analyzer package
Project description
🕵️ reStalker
IOC and Entities Extractor
A powerful Python library for extracting Indicators of Compromise (IOCs) and various entities from binary or text sources.
📋 Table of Contents
- 🚀 Quick Start
- 🎯 Detection Capabilities
- 📦 Installation
- 🧠 GLiNER2 Named Entity Recognition
- 💻 Usage Examples
- 🔄 Migration Guide
- 📖 Documentation
- 🤝 Contributing
- 🙏 Acknowledgements
🚀 Quick Start
import restalker
# Regex-only (fast, no ML model - recommended for IOC extraction)
s = restalker.reStalker(tor=True, i2p=True)
elements = s.parse(input_text)
for element in elements:
print(f"[*] Darknet IOC found: {element}")
# With GLiNER2 AI (for person names, orgs, locations)
s = restalker.reStalker(use_ner=True, own_name=True, organization=True)
elements = s.parse(input_text)
🎯 Detection Capabilities
reStalker can extract these entities from any binary or text source:
🔐 Credentials & Identity
- Base64 encoded data (
base64=True) - Username patterns (
username=True) - Password patterns (
password=True) - Phone numbers (
phone=True) - Email addresses (
email=True) - Personal names (
own_name=True) - PGP keys (
pgp=True)
🗺️ Location & Organization
- Location information (
location=True) - Organization names (
organization=True) - Keyphrases (
keyphrase=True) - Keywords (
keywords=["keyword1", "keyword2"])
📊 Analytics & Tracking
- Google Analytics tracking codes (
gatc=True)
💰 Digital Assets & Cryptocurrencies
- BTC (Bitcoin) wallet addresses (
btc_wallet=True) - ETH (Ethereum) wallet addresses (
eth_wallet=True) - XMR (Monero) wallet addresses (
xmr_wallet=True) - ZEC (Zcash) wallet addresses (
zec_wallet=True) - DASH wallet addresses (
dash_wallet=True) - DOT (Polkadot) wallet addresses (
dot_wallet=True) - XRP (Ripple) wallet addresses (
xrp_wallet=True) - BNB (Binance) wallet addresses (
bnb_wallet=True)
📱 Social Networks & Communication
- Twitter/X account handles (
twitter=True) - Telegram URLs (
telegram=True) - WhatsApp URLs (
whatsapp=True) - Discord URLs (
discord=True) - Skype URLs (
skype=True) - Tox ID identifiers (
tox=True) - Session ID identifiers (
session_id=True)
🔐 Cryptographic Hashes
- MD5 hash values (
md5=True) - SHA1 hash values (
sha1=True) - SHA256 hash values (
sha256=True)
💳 Financial Information
- BIN (Bank Identification Numbers) (
bin_number=True) - Credit Card numbers (
credit_card=True) - CCN (Credit Card Numbers - generic) (
ccn_number=True)
🌐 Dark Web & Alternative Networks
- Tor (.onion) URLs (
tor=True) - I2P URLs (
i2p=True) - Freenet URLs (
freenet=True) - ZeroNet URLs (
zeronet=True) - BitName URLs (
bitname=True) - IPFS URLs (
ipfs=True)
📋 Paste Sites & Code Sharing
- justpaste.it links (
paste=True) - pastebin.com links (
paste=True) - pasted.co links (
paste=True) - hastebin.com links (
paste=True) - snipt.org links (
paste=True) - gist.github.com links (
paste=True) - telegra.ph links (
paste=True) - ghostbin.com links (
paste=True)
📦 Installation
🎯 Quick Start
CPU-only (Default, Recommended for Most Users):
pip install restalker
Or with Poetry:
poetry add restalker
🚀 GPU Acceleration (Optional)
reStalker supports GPU acceleration for significantly faster entity extraction using GLiNER2. Choose the appropriate installation method based on your hardware:
🔍 Automatic Detection (Recommended)
# Clone or navigate to the repository
git clone https://github.com/junquera/restalker.git
cd restalker
# Detect your GPU hardware
python scripts/detect_gpu.py
# Follow the recommended installation command shown
🎮 Manual GPU Installation
NVIDIA GPU (CUDA 11.8+):
# Using Poetry
poetry install --extras gpu
# Using pip with setup.py
pip install -e .[gpu]
# Using requirements file
pip install -r requirements-gpu-cuda.txt
AMD GPU (ROCm 5.x+, Linux only):
# First, install ROCm: https://rocm.docs.amd.com/
# Using Poetry
poetry install --extras amd-gpu
# Using pip with setup.py
pip install -e .[amd-gpu]
# Using requirements file
pip install -r requirements-gpu-rocm.txt
CPU-only (Explicit):
# Using Poetry
poetry install
# Using pip with requirements file
pip install -r requirements.txt
💾 Disk Space & Performance
| Installation | Disk Space | Performance vs CPU | Best For |
|---|---|---|---|
| CPU-only | ~500 MB | Baseline (1x) | Most users, portable systems |
| NVIDIA GPU | ~3.2 GB | 5-10x faster | Systems with NVIDIA GPUs |
| AMD GPU | ~3.5 GB | 3-7x faster | Linux systems with AMD GPUs |
✅ Verify GPU Installation
After installing with GPU support, verify it's working:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")
🧠 GLiNER2 Named Entity Recognition
reStalker uses GLiNER2 (Generalized Named Entity Recognition v2) for advanced entity extraction. This AI-powered system provides context-aware detection of personal information, organizations, locations, and more.
What is GLiNER2?
GLiNER2 is a state-of-the-art zero-shot Named Entity Recognition model that can identify entities without task-specific training. It understands context and relationships between words, making it highly accurate for extracting:
- Personal names (people mentioned in text)
- Organizations (companies, agencies, groups)
- Locations (cities, countries, addresses)
- Phone numbers (with context validation)
- Email addresses
- Keyphrases (important multi-word expressions)
Model Used
reStalker v2.2.0+ uses the fastino/gliner2-large-v1 model (~340MB):
- 340M parameters for high accuracy
- Optimized for cybersecurity and OSINT use cases
- No TensorFlow dependency required
- Runs efficiently on CPU or GPU
- Only loaded when
use_ner=Trueis set
Enhanced Phone Detection
GLiNER2 includes advanced phone number detection with hex filtering to prevent false positives:
- As of v2.2.2, phone parsing uses stricter
EXACT_GROUPINGvalidation inphonenumbersto reduce loosely formatted false positives.
import restalker
# Phone numbers in cryptographic hashes are NOT detected
stalker = restalker.reStalker(phone=True)
text = "Hash: a1b2c3d4567890abcdef" # Contains "567890" but not a phone
results = stalker.parse(text)
# No phone detected ✓
# Real phone numbers ARE detected
text = "Contact: +1-555-123-4567"
results = stalker.parse(text)
# Phone detected: +1-555-123-4567 ✓
This enhancement prevents crypto wallet addresses, hashes (MD5, SHA1, SHA256), and hex strings from being incorrectly identified as phone numbers.
Context-Aware Extraction
GLiNER2 validates entity context to ensure accurate extraction:
# Prevents substring matches
text = "myemail@example.com" # "example" is part of email, not a person
stalker = restalker.reStalker(own_name=True, email=True)
results = stalker.parse(text)
# Extracts email, but "example" is not extracted as a name ✓
# Handles multi-line entities
text = """
Name: John
Doe
"""
results = stalker.parse(text)
# Correctly splits "John" and "Doe" as separate entities ✓
Migration from GLiNER v0.2.x
If you're upgrading from reStalker v2.1.x (which used GLiNER v0.2.25), the changes are seamless:
- No API changes - All your existing code works as-is
- Better accuracy - Improved entity detection with fewer false positives
- Faster performance - GLiNER2 is more optimized
- No TensorFlow - Reduced dependencies and installation size
💻 Usage Examples
Basic Usage (Regex-Only, Fastest)
import restalker
# Regex-only: fast IOC extraction, no ML model loaded
# use_ner defaults to False, so this is equivalent to use_ner=False
stalker = restalker.reStalker(tor=True, i2p=True, btc_wallet=True)
# Parse input text for IOCs
elements = stalker.parse(input_text)
# Process the results
for element in elements:
print(f"[*] IOC found: {element}")
Advanced Configuration (With GLiNER2 AI)
import restalker
# Enable GLiNER2 for AI-powered entity extraction
# use_ner=True is required for: own_name, organization, location, username, password
stalker = restalker.reStalker(
use_ner=True, # Enable GLiNER2 NER model
own_name=True, # Person names (requires use_ner=True)
organization=True, # Organizations (requires use_ner=True)
location=True, # Locations (requires use_ner=True)
tor=True, # Tor .onion URLs
i2p=True, # I2P URLs
btc_wallet=True, # Bitcoin addresses
eth_wallet=True, # Ethereum addresses
email=True, # Email addresses
telegram=True, # Telegram URLs
base64=True # Base64 encoded data
)
# Process your data
with open('data.txt', 'r') as f:
content = f.read()
results = stalker.parse(content)
# Categorize results
for result in results:
print(f"Type: {result.type}, Value: {result.value}")
Choosing use_ner=True vs use_ner=False
| Feature | use_ner=False (default) |
use_ner=True |
|---|---|---|
| Speed | Fast (no model loading) | Slower (~340MB model loads on first use) |
| Person Names | Not available | Available (own_name=True) |
| Organizations | Not available | Available (organization=True) |
| Locations | Not available | Available (location=True) |
| Usernames / Passwords | Not available | Available |
| Phone Detection | Regex-based | Enhanced with GLiNER2 context validation |
| Memory Usage | ~50MB | ~400MB |
| Best For | IOC extraction, crypto wallets, URLs | OSINT, entity extraction, person tracking |
🔄 Migration Guide
Upgrading from v2.1.x to v2.2.x
use_ner Parameter (v2.2.1+)
v2.2.1 adds the optional use_ner parameter (default: False). All existing code continues to work unchanged - you only need to add use_ner=True if you want GLiNER2 AI features.
# v2.1.x code - still works in v2.2.x
stalker = reStalker(phone=True, email=True, btc_wallet=True)
# Explicitly opting into GLiNER2 AI features
stalker = reStalker(use_ner=True, own_name=True, organization=True, location=True)
GLiNER2 Model Download (v2.2.0)
On first use of use_ner=True, reStalker downloads the fastino/gliner2-large-v1 model (~340MB) from HuggingFace. This is a one-time download cached locally.
📖 Documentation
For comprehensive documentation, examples, and API reference, visit our documentation site.
🤝 Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
🙏 Acknowledgements
Byron Labs is an active supporter of the reStalker development.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file restalker-2.2.4.tar.gz.
File metadata
- Download URL: restalker-2.2.4.tar.gz
- Upload date:
- Size: 28.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.10.12 Linux/6.8.0-107-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
730f06c1c6366a972040a3713094837f012423a4e00b59bdfbe371b0b7694973
|
|
| MD5 |
2c7fc7d93f66e939cf05d1c1a86b2c30
|
|
| BLAKE2b-256 |
e63e749cc113aa5e1c279485011d159f5fc53824342cddd42cc83f850d961ff2
|
File details
Details for the file restalker-2.2.4-py3-none-any.whl.
File metadata
- Download URL: restalker-2.2.4-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.10.12 Linux/6.8.0-107-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
748ca25471771517e98b4146b8ca12b3287d106c6d3e70a8c9e70411352a35d7
|
|
| MD5 |
c82480abff69672142938449a834a2bd
|
|
| BLAKE2b-256 |
3691b29651605cbcf07cc3ba2d6002982ee2f3a7f13f31c52c469126b44eb863
|