High-speed malicious URL detection using a Bloom Filter
Project description
dvara
High-speed malicious URL detection using a probabilistic Bloom Filter pipeline.
pip install dvara
dvara check https://google.com
✅ CLEAN | 0.03ms | online
dvara check "http://xn--90abegbttpjb3bzb2j.xn--p1ai/doc/En/ACCOUNT/Auditor-of-State-Notification-of-EFT-Deposit"
🚨 MALICIOUS | 213.2ms | online
What is dvara?
dvara is a Python CLI and backend system for malicious URL detection using a probabilistic Bloom Filter architecture inspired by systems like Google Safe Browsing.
It ingests live threat intelligence feeds from:
- URLhaus
- PhishTank
- OpenPhish
- Cert.pl
and currently indexes:
268,970 confirmed malicious URLs
inside a compressed Bloom Filter occupying only:
5.14 MB
Most clean URLs are resolved entirely in-memory without touching the database.
Only Bloom filter hits trigger PostgreSQL confirmation.
Architecture
Threat feeds
↓
URL normalization + deduplication
↓
Bloom Filter generation
↓
PostgreSQL confirmed_urls database
↓
FastAPI backend deployment
↓
CLI / API URL checks
URL check pipeline
dvara check [url]
↓
Bloom Filter lookup (~3µs local)
↓
No match
→ CLEAN instantly
Possible match
↓
SHA256(url)
↓
PostgreSQL confirmation lookup
↓
MALICIOUS or SUSPICIOUS
Why Bloom Filters?
Traditional hash sets for millions of URLs consume hundreds of MBs of RAM.
Bloom Filters allow:
- massive memory compression
- constant-time lookups
- zero false negatives
- extremely high throughput
Tradeoff:
- small false positive probability
False positives are resolved using PostgreSQL confirmation.
Benchmarks
Generated using:
python -m dvara.benchmarks
| Metric | Result |
|---|---|
| Local Bloom lookup latency | ~0.003ms (3µs) |
| Throughput | ~145k URLs/sec |
| Indexed malicious URLs | 268,970 |
| Filter size | 5.14 MB |
| Peak RAM usage | ~10.53 MB |
| False negatives | 0 observed |
| False positives | 0 / 100,000 tested |
| Bloom capacity | 3,000,000 URLs |
Benchmark latency refers to local in-memory Bloom Filter checks. Network/API requests are naturally slower due to HTTP and database confirmation stages.
Threat Intelligence Sources
| Feed | Type |
|---|---|
| URLhaus | Malware URLs |
| PhishTank | Verified phishing URLs |
| OpenPhish | Active phishing feeds |
| Cert.pl | Malicious domains |
Installation
CLI only
pip install dvara
Backend/server dependencies
pip install dvara[server]
CLI Usage
Check URL (online)
dvara check https://example.com
Check URL (offline)
dvara check https://example.com --offline
Show stats
dvara stats
Update local filter
dvara update
Run ingestion
dvara ingest
Running the Backend
Docker Compose
git clone https://github.com/dhruv-0512/dvara
cd dvara
docker compose up --build
Services:
- FastAPI
- PostgreSQL
- Redis
Manual setup
pip install dvara[server]
python -m dvara.ingestion
uvicorn dvara.app:app --reload
API Endpoints
| Endpoint | Description |
|---|---|
/api/check |
Full two-stage URL check |
/api/confirm |
Direct PostgreSQL lookup |
/api/stats |
Bloom + backend stats |
/api/reload |
Reload filter |
/health |
Health check |
Example API Response
{
"url": "http://malicious-site.com",
"result": "MALICIOUS",
"latency_ms": 213.2,
"stage": "db",
"checked_at": "2026-05-09T09:08:32.663182+00:00"
}
Project Structure
dvara/
├── app.py
├── bloom.py
├── cli.py
├── config.py
├── ingestion.py
├── benchmarks.py
Technical Details
Bloom Filter Parameters
Capacity: 3,000,000 URLs
Target FPR: 0.1%
Hash functions (k): 10
Current fill ratio: ~6%
Filter size: 5.14 MB
Hashing
- MurmurHash3 for Bloom lookups
- SHA256 for PostgreSQL confirmation keys
Deployment Stack
| Component | Service |
|---|---|
| API | Render |
| Database | Supabase PostgreSQL |
| Redis | Upstash Redis |
| Package hosting | PyPI |
Why "dvara"?
dvara (द्वार) is the Sanskrit word for:
gateway / doorway
Every URL is a gateway.
dvara stands at that gateway and decides what gets through.
Security Note
dvara is intended for defensive cybersecurity research, malicious URL analysis, and educational purposes.
While the system uses real threat intelligence feeds and probabilistic detection techniques, it should not be treated as a replacement for enterprise secure web gateways, antivirus engines, or production threat prevention systems.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dvara-0.2.0.tar.gz.
File metadata
- Download URL: dvara-0.2.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9068de3ff38809a5d81be59de572d380f6e704f63a275936b1dcdde1c4f56dc7
|
|
| MD5 |
762719ed653b4be30019a175636954b3
|
|
| BLAKE2b-256 |
f45bb7b4661c774bff886c180755fcbf776fb845974a16545fc21f4b07533788
|
File details
Details for the file dvara-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dvara-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7617330c111364c9fddf72d10c386f0d99afe8c98a210c729ca95b1a3f283bc4
|
|
| MD5 |
1002337ce409dcfadc27397b5bf56d1c
|
|
| BLAKE2b-256 |
9827344c81ca3bf907f98a8c033bfc8f4ee0b16d9ea882af93a780f1c7ef48df
|