A Python library for downloading files from a webpage, with support for recursion depth and optional merging.
Project description
fetcharoo
fetcharoo is a Python library for downloading PDF files from a webpage. It provides support for specifying recursion depth and offers the option to merge downloaded PDFs into a single file.
Features
- Download PDF files from a specified webpage.
- Specify recursion depth to control how many levels of links to follow when searching for PDFs.
- Choose to merge downloaded PDFs into a single file or store them as separate files.
- Simple and easy-to-use Python interface.
Requirements
- Python 3.10 or higher
- Third-party libraries:
requests
,PyMuPDF
Installation
Using pip
You can install fetcharoo using pip:
pip install fetcharoo
Using Poetry
If you are using Poetry to manage your project, you can install fetcharoo as a dependency:
poetry add fetcharoo
If you don't have Poetry installed, you can install it by following the instructions on the official Poetry website.
Getting Started
To get started with fetcharoo, follow these steps:
- Install the library using pip or Poetry (see the Installation section above).
- Import the
download_pdfs_from_webpage
function from thefetcharoo
module. - Use the function to download PDFs from a webpage, specifying the URL, recursion depth, mode (merge or separate), and output directory.
Here's a basic example:
from fetcharoo import download_pdfs_from_webpage
# Download PDFs from a webpage and merge them into a single file
download_pdfs_from_webpage(
url='https://example.com',
recursion_depth=1,
mode='merge',
output_dir='output'
)
Advanced Usage
fetcharoo provides additional options for customizing the behavior of the library:
- To download PDFs and store them as separate files, set the
mode
parameter toseparate
:
download_pdfs_from_webpage(
url='https://example.com',
recursion_depth=1,
mode='separate',
output_dir='output'
)
To control the recursion depth, adjust the recursion_depth
parameter. For example, to follow links up to two levels deep, set recursion_depth
=2.
Contributing
Contributions to fetcharoo are welcome! If you'd like to contribute, please follow these steps:
- Fork the repository on GitHub.
- Create a branch for your changes.
- Make your changes and commit them to your branch.
- Submit a pull request with your changes.
- We appreciate any contributions, whether it's fixing bugs, adding new features, or improving documentation.
Support
If you encounter any issues or have questions about using fetcharoo, please open an issue on the GitHub repository. We'll do our best to assist you.
Changelog
Please refer to the CHANGELOG.md file for a summary of changes in each release.
Authors and Acknowledgments
fetcharoo was developed by Mark Lifson. I'd like to thank all contributors and users for their support.
License
This project is licensed under the MIT License. See the LICENSE file for details. The MIT License allows for broad permissions, including use, modification, distribution, and sublicensing of the software.
Update - May 7th, 2023
Added new features to fetcharoo:
merge_pdfs
function to merge multiple PDFs into a single file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fetcharoo-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a16e96f768679bc572b734080643ffe03645180325a917c65bbfeb698c1a03f |
|
MD5 | 96faea0ce74167c434c8a937f7f9bbdf |
|
BLAKE2b-256 | 873d78484081b5c4ad3f64f18b38fe6f1f337cfb381b1f646fa60f7057e667ba |