A command-line tool to fetch files from websites recursively
Project description
FetchAnything
A command-line tool to fetch files from websites recursively.
Installation
You can install FetchAnything using pip:
pip install fetchanything
Or from source:
git clone https://github.com/yourusername/fetchanything.git
cd fetchanything
pip install -e .
Usage
Basic usage:
fetchanything <URL> [options]
Options
-l, --level LEVEL: Maximum crawl depth (default: 2)-f, --filter PATTERN: File pattern to match (e.g., ".pdf", ".jpg")-u, --url-pattern PATTERN: Regex pattern to match URLs for crawling (e.g., "./blog/.")-o, --out DIRECTORY: Output directory (default: downloads)-v, --verbose: Enable verbose output
Examples
- Download all PDF files from a website up to depth 2:
fetchanything https://example.com --level 2 --filter "*.pdf" --out download_pdf
- Download all files from a website up to depth 1:
fetchanything https://example.com --level 1 --out downloads
- Download all images with verbose output:
fetchanything https://example.com --filter "*.jpg" -v
- Download PDFs only from blog pages:
fetchanything https://example.com --filter "*.pdf" --url-pattern ".*/blog/.*"
- Download files only from specific subdomain:
fetchanything https://example.com --url-pattern "https://docs\\.example\\.com/.*"
Features
- Recursive website crawling with depth control
- File pattern matching
- URL pattern filtering
- Progress tracking with tqdm
- Verbose logging option
- Persistent HTTP sessions
- Error handling and graceful interruption
Requirements
- Python 3.7 or higher
- requests
- beautifulsoup4
- tqdm
- urllib3
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fetchanything-0.2.0.tar.gz.
File metadata
- Download URL: fetchanything-0.2.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be0cf4c8fb003c36ea4a4250c019d8be8d294f3c2f82cbbba4b7cd8828ddf1fd
|
|
| MD5 |
963ffcfc6e98183a1fdfc3cb98c3cef9
|
|
| BLAKE2b-256 |
2ecaf18cb91d0b3af40ccc5209cf4927032917d306b9336bce124403f4585a39
|
Provenance
The following attestation bundles were made for fetchanything-0.2.0.tar.gz:
Publisher:
python-publish.yml on chaochungkuo/fetchanything
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fetchanything-0.2.0.tar.gz -
Subject digest:
be0cf4c8fb003c36ea4a4250c019d8be8d294f3c2f82cbbba4b7cd8828ddf1fd - Sigstore transparency entry: 199612380
- Sigstore integration time:
-
Permalink:
chaochungkuo/fetchanything@7adad3fde0a4b25c187a5fdeffffb7f569fea28e -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/chaochungkuo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7adad3fde0a4b25c187a5fdeffffb7f569fea28e -
Trigger Event:
release
-
Statement type:
File details
Details for the file fetchanything-0.2.0-py3-none-any.whl.
File metadata
- Download URL: fetchanything-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4f8d913125e09463b452ef58576dd8a6ee6d0deb47b7f73793c20f8fbac8e1e
|
|
| MD5 |
e41d286962ecf2d99627a33c2746abfc
|
|
| BLAKE2b-256 |
eea2a0d4d28066586d7a38dd3bfe2570c6f6815f8c2815e7800a1597f4a5863f
|
Provenance
The following attestation bundles were made for fetchanything-0.2.0-py3-none-any.whl:
Publisher:
python-publish.yml on chaochungkuo/fetchanything
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fetchanything-0.2.0-py3-none-any.whl -
Subject digest:
d4f8d913125e09463b452ef58576dd8a6ee6d0deb47b7f73793c20f8fbac8e1e - Sigstore transparency entry: 199612381
- Sigstore integration time:
-
Permalink:
chaochungkuo/fetchanything@7adad3fde0a4b25c187a5fdeffffb7f569fea28e -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/chaochungkuo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7adad3fde0a4b25c187a5fdeffffb7f569fea28e -
Trigger Event:
release
-
Statement type: