Skip to main content

Phishing Web Collector

Project description

PhishingWebCollector

⚔️ PhishingWebCollector: A Python Library for Phishing Website Collection ⚔️

PyPI - Python Version PyPI - Downloads Packaging status Downloads GitHub license Documentation Status

✨ Why PhishingWebCollector? 📦 Features 🚀 Quick Start 📮 Documentation 📓 Jupyter Notebook examples 🔑 License

Overview

PhishingWebCollector is a Python library that integrates 20 phishing feeds into one solution and offers a platform for collecting and managing malicious website data. Suitable for practical cybersecurity applications, like updating local blacklists, and research, such as building phishing detection datasets. It utilizes the asyncio module for efficient parallel processing and data collection. Users can gather historical data from free feeds to construct extensive datasets without costly API subscriptions. Its ease of use, scalability, and support for various data formats enhance the threat detection capabilities of cybersecurity teams and researchers while minimizing technical overhead.

  • Free software: MIT license,
  • Python versions: 3.9 | 3.10 | 3.11
  • Tested OS: Windows, Ubuntu, Fedora and CentOS. However, that does not mean it does not work on others.
  • All-in-One Solution:: PhishingWebCollector is an all-in-one solution that allows for the collection of a wide range of information about websites.
  • Efficiency and Expertise: : Building a similar solution independently would be very time-consuming and require specialized knowledge.
  • Open Source Advantage: : Publishing this tool as open source will facilitate many studies, making them simpler and allowing researchers and industry professionals to focus on more advanced tasks.
  • Continuous Improvement: : New techniques will be added successively, ensuring continuous growth in this area.

Features

  • Integration of 22 Different Sources: Reduces the need to maintain multiple integrations.
  • Local Data Collection: Supports building and maintaining local phishing databases.
  • Data Export: Allows exporting all collected data in a unified JSON format.
  • Asynchronous Performance: Uses asyncio for faster, simultaneous data collection.

Integrations

Why PhishingWebCollector?

While many tools and scripts can collect phishing data, none offer a complete all-in-one solution like PhishingWebCollector. It combines comprehensive functionality with high performance, asynchronous data collection, and easy configuration, making it both efficient and user-friendly.

How to use

Library can be installed using pip:

pip install phishing-web-collector

Code usage

Getting all phishing domains from all available sources

import phishing_web_collector as pwc

manager = pwc.FeedManager(
    sources=list(pwc.FeedSource),
    storage_path="feeds_data"
)

manager.sync_refresh_all()
entries = manager.sync_retrieve_all()

phishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]

for domain in phishing_domains:
    print(domain)

and as a results you will get the list of phishing domains.

All modules are exported into main package, so you can use import module and invoke them directly.

Jupyter Notebook Usage

If you would like to test PhishingWebCollector functionalities without installing it on your machine, consider using the preconfigured Jupyter notebook. It will show you how to collect phishing domains from all available sources and save them into a CSV file. You can run it in your browser without any installation using Google Colab.

To check how asynchronous data collection is faster than synchronous one, you can run the asynchronous benchmark.

Docker usage

If you want to use PhishingWebCollector in a Docker container, please check this README file.

Contributing

For contributing, refer to its CONTRIBUTING.md file. We are a welcoming community... just follow the Code of Conduct.

Maintainers

Project maintainers are:

  • Damian Frąszczak
  • Edyta Frąszczak

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phishing_web_collector-0.1.4.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phishing_web_collector-0.1.4-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file phishing_web_collector-0.1.4.tar.gz.

File metadata

  • Download URL: phishing_web_collector-0.1.4.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phishing_web_collector-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ab5459d632e00047b32ab9e78382e89d3e9434d8c8e5cd36aaa14203fdc4756c
MD5 e89dd57e778c3e7bd5e370e11922ab5f
BLAKE2b-256 6549e81740fe91fb182c54ad89cc68096bbc5efaad64867caa31bd3d264c0d74

See more details on using hashes here.

Provenance

The following attestation bundles were made for phishing_web_collector-0.1.4.tar.gz:

Publisher: python-publish.yml on damianfraszczak/phishing-web-collector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phishing_web_collector-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for phishing_web_collector-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fb557df1e51506ad9b134447daa14da6d320c56bd9dea01fc83ed829b5304394
MD5 959a5752878d209231d9269968cb4c1f
BLAKE2b-256 643aa060637802434e898633dbbcf36150fd1c00f526102f497d03a1de19c202

See more details on using hashes here.

Provenance

The following attestation bundles were made for phishing_web_collector-0.1.4-py3-none-any.whl:

Publisher: python-publish.yml on damianfraszczak/phishing-web-collector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page