No project description provided
Project description
Web Retriever is a robust Python-based API, designed to fetch and retrieve web resources on behalf of clients. It provides an effective solution when direct internet access is not available to the client or when external resources need to be explicitly defined, like in Envoy configurations.
About
Web Retriever is a robust API designed to facilitate the interaction between machine workloads and the Internet, acting as an intermediary that handles requests and fetches the necessary online resources.
Web Retriever is built upon the concept of Plugin Oriented Programming, allowing it to be highly extensible and customizable. It accepts one or more web resource locations to retrieve, serving as an intermediary that grants indirect access to the web resources. This makes it an ideal solution in various scenarios, especially in environments where clients have restricted or no direct internet access.
Furthermore, Web Retriever is particularly helpful in the context of Envoy configurations. In such settings, every external web resource has to be manually defined within a configuration file. Web Retriever can simplify this process by acting as a single point of reference for multiple web resources, thus reducing the complexity of the configuration.
A significant feature of Web Retriever is its Rule Engine, a powerful component that evaluates each request. It determines whether to allow or deny requests based on specific, predefined criteria, enhancing the security and efficacy of the interactions between machine workloads and the Internet.
Moreover, the Rule Engine is adept at manipulating request headers. It can dynamically insert essential elements, such as API tokens, into the headers, eliminating the need to distribute sensitive information across various workloads, thereby bolstering security protocols. Additionally, it can remove particular header information to prevent the unintentional disclosure of internal or sensitive data to external sources. In essence, Web Retriever aims to optimize the communication process between machine workloads and the Internet, ensuring it is secure, efficient, and effectively managed.
Whether you need to fetch a single web page or retrieve multiple resources concurrently, Web Retriever offers a reliable, efficient, and scalable solution. Its flexibility and adaptability make it a valuable tool in any organization’s toolkit.
What is POP?
This project is built with pop, a Python-based implementation of Plugin Oriented Programming (POP). POP seeks to bring together concepts and wisdom from the history of computing in new ways to solve modern computing problems.
For more information:
Getting Started
Prerequisites
Python 3.8+
git (if installing from source, or contributing to the project)
Installation
If wanting to use web-retriever, you can do so by either installing from PyPI or from source.
Install from PyPI
pip install web-retriever
Install from source
# clone repo
git clone git@gitlab.com/hoprco/web-retriever.git
cd web-retriever
# Setup venv
python3 -m venv .venv
source .venv/bin/activate
pip install .
Usage
$ web-retriever -h
usage: web-retriever [-h] [--config CONFIG] [--config-template] [--log-datefmt LOG_DATEFMT] [--log-file LOG_FILE] [--log-fmt-console LOG_FMT_CONSOLE]
[--log-fmt-logfile LOG_FMT_LOGFILE] [--log-handler-options [LOG_HANDLER_OPTIONS ...]] [--log-level LOG_LEVEL]
[--log-plugin {basic,datagram,null,rotating,socket,timed_rotating}] [--version] [--versions-report]
options:
-h, --help show this help message and exit
--config CONFIG, -c CONFIG
Load extra options from a configuration file onto hub.OPT.web_retriever
--config-template Output a config template for this command
--version Display version information
--versions-report Output a version report for reporting bugs
Logging Options:
--log-datefmt LOG_DATEFMT
The date format to display in the logs
--log-file LOG_FILE The location of the log file
--log-fmt-console LOG_FMT_CONSOLE
The log formatting used in the console
--log-fmt-logfile LOG_FMT_LOGFILE
The format to be given to log file messages
--log-handler-options [LOG_HANDLER_OPTIONS ...]
kwargs that should be passed to the logging handler used by the log_plugin
--log-level LOG_LEVEL
Set the log level, either quiet, info, warning, debug or error
--log-plugin {basic,datagram,null,rotating,socket,timed_rotating}
The logging plugin to use
Examples
Web Retriever, like all POP applications, can accept configuration files in YAML format. Configuration parameters can be passed to POP plugins inside the application via this configuration file. Rulesets are established in the configuration file and used by Web Retriever to enforce any defined rules. The following configuration file sets the application logging to DEBUG level and puts a simple rule in place to enforce access to the API only by clients residing on localhost.
pop_config:
log_level: DEBUG
web_retriever:
rules:
- rule_type: "deny"
rule_string: "remote != '127.0.0.1' or remote != '::1'"
The configuration file path is then passed to the application on the command line:
$ web-retriever -c config.yaml
======== Running on http://0.0.0.0:8080 ========
(Press CTRL+C to quit)
Roadmap
Reference the open issues for a list of proposed features (and known issues).
Acknowledgements
Img Shields for making repository badges easy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file web-retriever-0.3.0.tar.gz
.
File metadata
- Download URL: web-retriever-0.3.0.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55c9c1cb5e6d371bb30b0620a803e23749a4e76aa8ed7b51dc676cd8bf2f8825 |
|
MD5 | 6b5560810349583ade9c8a5ba7da5105 |
|
BLAKE2b-256 | 513083256f4924c66b8cc100b0cd98d8d643505460dc56534f4d47442c5d4892 |
File details
Details for the file web_retriever-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: web_retriever-0.3.0-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb9487c88ff577404f464d349d95be0edad294c3009870492ab5ccb04352cf8d |
|
MD5 | 081ae0e339f7182f6a0198060433e782 |
|
BLAKE2b-256 | 1e4702718d4dd269a0b5c38471b58f7e9651798ba41f0e4ad8c6a8776e58e36e |