Skip to main content

Phishing Web Collector

Project description

PhishingWebCollector

⚔️ PhishingWebCollector: A Python Library for Phishing Website Collection ⚔️

PyPI - Python Version PyPI - Downloads Packaging status Downloads GitHub license Documentation Status

✨ Why PhishingWebCollector? 📦 Features 🚀 Quick Start 📮 Documentation 📓 Jupyter Notebook examples 🔑 License

Overview

PhishingWebCollector is a Python library that integrates 20 phishing feeds into one solution and offers a platform for collecting and managing malicious website data. Suitable for practical cybersecurity applications, like updating local blacklists, and research, such as building phishing detection datasets. It utilizes the asyncio module for efficient parallel processing and data collection. Users can gather historical data from free feeds to construct extensive datasets without costly API subscriptions. Its ease of use, scalability, and support for various data formats enhance the threat detection capabilities of cybersecurity teams and researchers while minimizing technical overhead.

  • Free software: MIT license,
  • Python versions: 3.9 | 3.10 | 3.11
  • Tested OS: Windows, Ubuntu, Fedora and CentOS. However, that does not mean it does not work on others.
  • All-in-One Solution:: PhishingWebCollector is an all-in-one solution that allows for the collection of a wide range of information about websites.
  • Efficiency and Expertise: : Building a similar solution independently would be very time-consuming and require specialized knowledge.
  • Open Source Advantage: : Publishing this tool as open source will facilitate many studies, making them simpler and allowing researchers and industry professionals to focus on more advanced tasks.
  • Continuous Improvement: : New techniques will be added successively, ensuring continuous growth in this area.

Features

  • Integration of 22 Different Sources: Reduces the need to maintain multiple integrations.
  • Local Data Collection: Supports building and maintaining local phishing databases.
  • Data Export: Allows exporting all collected data in a unified JSON format.
  • Asynchronous Performance: Uses asyncio for faster, simultaneous data collection.

Integrations

Why PhishingWebCollector?

While many tools and scripts can collect phishing data, none offer a complete all-in-one solution like PhishingWebCollector. It combines comprehensive functionality with high performance, asynchronous data collection, and easy configuration, making it both efficient and user-friendly.

How to use

Library can be installed using pip:

pip install phishing-web-collector

Code usage

Getting all phishing domains from all available sources

import phishing_web_collector as pwc

manager = pwc.FeedManager(
    sources=list(pwc.FeedSource),
    storage_path="feeds_data"
)

manager.sync_refresh_all()
entries = manager.sync_retrieve_all()

phishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]

for domain in phishing_domains:
    print(domain)

and as a results you will get the list of phishing domains.

All modules are exported into main package, so you can use import module and invoke them directly.

Jupyter Notebook Usage

If you would like to test PhishingWebCollector functionalities without installing it on your machine consider using the preconfigured Jupyter notebook. It will show you how to collect phishing domains from all available sources and save them into a CSV file. You can run it in your browser without any installation using Google Colab.

To check how asynchronous data collection is faster than synchronous one, you can run the asynchronous benchmark notebook.

To check how to run feeds directly, you can run the direct feed invocation notebook.

Docker usage

If you want to use PhishingWebCollector in a Docker container, please check this README file.

Contributing

For contributing, refer to its CONTRIBUTING.md file. We are a welcoming community... just follow the Code of Conduct.

Maintainers

Project maintainers are:

  • Damian Frąszczak
  • Edyta Frąszczak

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phishing_web_collector-0.2.1.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phishing_web_collector-0.2.1-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file phishing_web_collector-0.2.1.tar.gz.

File metadata

  • Download URL: phishing_web_collector-0.2.1.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phishing_web_collector-0.2.1.tar.gz
Algorithm Hash digest
SHA256 34c8c572efdf98a281a2202bc7816e3c05cd89e7afa7f41a0cf6ad5c7c311a45
MD5 9d92edafaf5657f72a5d11d27fa7f968
BLAKE2b-256 e071f0ff277dbdf649e6a0b5af68cd92b24410daa5311893e988d75b75df211b

See more details on using hashes here.

Provenance

The following attestation bundles were made for phishing_web_collector-0.2.1.tar.gz:

Publisher: python-publish.yml on damianfraszczak/phishing-web-collector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phishing_web_collector-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for phishing_web_collector-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bdffae8c00eaf6874814d580d0510767c47c5b72c0d88e06d14a2e07a0b4ee6a
MD5 2e667f4321cdc77eda676be31383cd7c
BLAKE2b-256 8bbaa7d3231820c173e71437aca56dcac4750200e5170c1444c319f0fbb247e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for phishing_web_collector-0.2.1-py3-none-any.whl:

Publisher: python-publish.yml on damianfraszczak/phishing-web-collector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page