A powerful web scraping and downloading utility

These details have not been verified by PyPI

Project links

Project description

Scrappify

Scrappify is a powerful yet simple website scraping and downloading tool. It allows you to easily scrape links, download files, filter by file types, extract patterns (like emails or phone numbers), and perform deep crawling — all from Python or the command line.

Features

Download entire websites
Extract links, emails, phone numbers, or custom regex patterns
Filter downloads by file type (images, documents, scripts, etc.)
Fast downloads with configurable workers
Cross-domain crawling support
Command-line interface (CLI) and Python API

Installation

pip install scrappify

Python Usage

Basic Usage

from scrappify import url, scrap, download

# Download entire website
url_download = url("https://example.com")
downloaded_files = download(url_download, output_dir="my_site")
print(f"Downloaded {len(downloaded_files)} files")

# Get all links from a page
links = scrap(url_download)
print(f"Found {len(links)} links")

File Type Filtering

from scrappify import url, download
from scrappify.patterns import file_type

# Download only JavaScript files
js_files = download("https://example.com", file_type="js", output_dir="js_files")

# Download images using category
images = download("https://example.com", file_type=file_type['image'], output_dir="images")

# Download multiple specific file types
docs_and_images = download("https://example.com", file_type=["pdf", "jpg", "png"])

Pattern Searching

from scrappify import url, download
from scrappify.patterns import pattern

# Find emails in all downloaded files
email_results = download("https://example.com", pattern=pattern['email'])

# Find phone numbers in HTML files only
phone_results = download("https://example.com", file_type="html", pattern=pattern['phone'])

# Custom regex pattern
custom_pattern = r'\b\d{3}-\d{2}-\d{4}\b'  # SSN pattern
ssn_results = download("https://example.com", pattern=custom_pattern)

# Combine file type and pattern
results = download("https://example.com", file_type="js", pattern=pattern['url'])

Advanced Scraping

from scrappify import url, scrap, download

# Deep crawling (multiple levels)
deep_links = scrap("https://example.com", depth=3)
print(f"Found {len(deep_links)} links across 3 levels")

# Download with increased workers
fast_download = download("https://example.com", max_workers=20, output_dir="fast_download")

# Cross-domain downloading (disable same-domain restriction)
all_links = scrap("https://example.com", same_domain_only=False)

Programmatic Pattern Extraction

from scrappify.core.utils import search_pattern_in_file

# Search pattern in specific file
results = search_pattern_in_file("downloaded_file.html", r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
for result in results:
    print(f"Email found: {result['match']} at line {result['line']}")

Command Line Usage

# Download entire website
scrappify https://example.com -o my_site

# Download only PDF files
scrappify https://example.com -t pdf -o documents

# Download images and search for emails
scrappify https://example.com -t image -p email -o images_with_emails

# Deep crawl (3 levels) and download everything
scrappify https://example.com -d 3 -o deep_site

# Use custom regex pattern
scrappify https://example.com -p '\b\d{3}-\d{2}-\d{4}\b' -o ssn_search

# List available patterns
scrappify --list-patterns

# List available file types
scrappify --list-types

# High-performance download with 20 workers
scrappify https://example.com -w 20 -o fast_download

Complex Examples

# Download all JavaScript and CSS files, search for URLs
scrappify https://example.com -t javascript -t css -p url -o assets_with_urls

# Download documents and images, search for prices
scrappify https://example.com -t document -t image -p price -o priced_content

# Deep crawl with custom pattern
scrappify https://example.com -d 2 -p '#[a-zA-Z0-9_]+' -o hashtags

Available Options

File Types

image → png, jpg, gif, svg, etc.
document → pdf, docx, txt, etc.
javascript, css, html
Custom extensions supported (e.g., zip, mp4)

Patterns

email → find emails
phone → detect phone numbers
url → extract URLs
price → detect price patterns
Custom regex patterns supported

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Sep 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrappify-0.0.1.tar.gz (8.8 kB view details)

Uploaded Sep 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrappify-0.0.1-py3-none-any.whl (8.6 kB view details)

Uploaded Sep 13, 2025 Python 3

File details

Details for the file scrappify-0.0.1.tar.gz.

File metadata

Download URL: scrappify-0.0.1.tar.gz
Upload date: Sep 13, 2025
Size: 8.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for scrappify-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`8ebf524e78cede0598ceedf6213504d4830fa416b025069d0e190c90ba0d7fbe`
MD5	`8c5ded4056078f9ce9cf58f7fe443977`
BLAKE2b-256	`a426307582480604e891dd22ed1bcdb9c28497e3605d686245ce866d2f4064bc`

See more details on using hashes here.

File details

Details for the file scrappify-0.0.1-py3-none-any.whl.

File metadata

Download URL: scrappify-0.0.1-py3-none-any.whl
Upload date: Sep 13, 2025
Size: 8.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for scrappify-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44aea120bd9c95d0b31cd2f82d5666fa497b0c627cf82890b8c843b55676a735`
MD5	`e7c8c51b4dfffe3ae4941a9aff4c0b2d`
BLAKE2b-256	`6304b9e36203d24758e08a628322077df06a27157be647a5956297ec85bf980f`

See more details on using hashes here.

scrappify 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scrappify

Features

Installation

Python Usage

Basic Usage

File Type Filtering

Pattern Searching

Advanced Scraping

Programmatic Pattern Extraction

Command Line Usage

Complex Examples

Available Options

File Types

Patterns

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes