Skip to main content

Phishing Web Collector

Project description

PhishingWebCollector

⚔️ PhishingWebCollector: A Python Library for Phishing Website Collection ⚔️

PyPI - Python Version PyPI - Downloads Packaging status Downloads GitHub license Documentation Status

✨ Why PhishingWebCollector? 📦 Features 🚀 Quick Start 📮 Documentation 📓 Jupyter Notebook examples 🔑 License

Overview

PhishingWebCollector is a Python library that integrates 20 phishing feeds into one solution and offers a platform for collecting and managing malicious website data. Suitable for practical cybersecurity applications, like updating local blacklists, and research, such as building phishing detection datasets. It utilizes the asyncio module for efficient parallel processing and data collection. Users can gather historical data from free feeds to construct extensive datasets without costly API subscriptions. Its ease of use, scalability, and support for various data formats enhance the threat detection capabilities of cybersecurity teams and researchers while minimizing technical overhead.

  • Free software: MIT license,
  • Python versions: 3.9 | 3.10 | 3.11
  • Tested OS: Windows, Ubuntu, Fedora and CentOS. However, that does not mean it does not work on others.
  • All-in-One Solution:: PhishingWebCollector is an all-in-one solution that allows for the collection of a wide range of information about websites.
  • Efficiency and Expertise: : Building a similar solution independently would be very time-consuming and require specialized knowledge.
  • Open Source Advantage: : Publishing this tool as open source will facilitate many studies, making them simpler and allowing researchers and industry professionals to focus on more advanced tasks.
  • Continuous Improvement: : New techniques will be added successively, ensuring continuous growth in this area.

Features

  • Integration of 22 Different Sources: Reduces the need to maintain multiple integrations.
  • Local Data Collection: Supports building and maintaining local phishing databases.
  • Data Export: Allows exporting all collected data in a unified JSON format.
  • Asynchronous Performance: Uses asyncio for faster, simultaneous data collection.

Integrations

Why PhishingWebCollector?

While many tools and scripts can collect phishing data, none offer a complete all-in-one solution like PhishingWebCollector. It combines comprehensive functionality with high performance, asynchronous data collection, and easy configuration, making it both efficient and user-friendly.

How to use

Library can be installed using pip:

pip install phishing-web-collector

Code usage

Getting all phishing domains from all available sources

import phishing_web_collector as pwc

manager = pwc.FeedManager(
    sources=list(pwc.FeedSource),
    storage_path="feeds_data"
)

manager.sync_refresh_all()
entries = manager.sync_retrieve_all()

phishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]

for domain in phishing_domains:
    print(domain)

and as a results you will get the list of phishing domains.

All modules are exported into main package, so you can use import module and invoke them directly.

Jupyter Notebook Usage

If you would like to test PhishingWebCollector functionalities without installing it on your machine consider using the preconfigured Jupyter notebook. It will show you how to collect phishing domains from all available sources and save them into a CSV file. You can run it in your browser without any installation using Google Colab.

To check how asynchronous data collection is faster than synchronous one, you can run the asynchronous benchmark notebook.

To check how to run feeds directly, you can run the direct feed invocation notebook.

Docker usage

If you want to use PhishingWebCollector in a Docker container, please check this README file.

Contributing

For contributing, refer to its CONTRIBUTING.md file. We are a welcoming community... just follow the Code of Conduct.

Maintainers

Project maintainers are:

  • Damian Frąszczak
  • Edyta Frąszczak

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phishing_web_collector-0.3.0.tar.gz (18.6 kB view details)

Uploaded Source

File details

Details for the file phishing_web_collector-0.3.0.tar.gz.

File metadata

  • Download URL: phishing_web_collector-0.3.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for phishing_web_collector-0.3.0.tar.gz
Algorithm Hash digest
SHA256 00e6557628ffdb46e1ec23bcd8b9af72f09cce19b5e971b0e8002d75e32c5623
MD5 88ba1aeaf8adc53d9fa731ce5c287159
BLAKE2b-256 92597081b89021f0dd9b5c235d38f44b93be86e8be1338d7fa378323345f08db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page