Upload scrapy logs to cloud storage
Project description
Scrapy Log Export
Description
A scrapy extension that allows for a LOG_URI setting, similar to a FEED_URI setting. The same FEED_STORAGE classes that are used in the feedexport extensions are used here.
This extension is useful if you're running scrapy in a container and want to store your logs with a cloud service provider.
Please note that this extension still requires that a local log file is written. Once scrapy's engine has stopped, the extension will upload the log file to the cloud and optionally delete the local file.
Installation
You can install scrapy-logexporter using pip:
pip install scrapy-logexporter
Configuration
Enable the extension by adding it to your settings.py
:
from environs import Env
env = Env()
env.read_env()
# Enable the extension
EXTENSIONS = {
"scrapy_logexport.LogExporter": 0,
}
LOG_FILE = 'scrapy.log' # Must be a local file
LOG_EXPORTER_DELETE_LOCAL = True # Delete local log file after upload, defaults to False
LOG_URI = f"s3://your-bucket/%(name)s %(time)s.log" # Store on S3
AWS_ACCESS_KEY_ID = env("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = env("AWS_SECRET_ACCESS_KEY")
Setting LOG_URI
The FEED_STORAGE class used for the LOG_URI is determined by the URI scheme. The following schemes are supported, by default:
FEED_STORAGES_BASE = {
"": "scrapy.extensions.feedexport.FileFeedStorage",
"file": "scrapy.extensions.feedexport.FileFeedStorage",
"ftp": "scrapy.extensions.feedexport.FTPFeedStorage",
"gs": "scrapy.extensions.feedexport.GCSFeedStorage",
"s3": "scrapy.extensions.feedexport.S3FeedStorage",
"stdout": "scrapy.extensions.feedexport.StdoutFeedStorage",
}
If you've already added more to FEED_STORAGES they're be available for use with LOG_URI. Additionally a LOG_STORAGES setting is available to add more storage classes for use with LOG_URI.
Also not that similar to FEED_URI, the LOG_URI can be a template string. By default
any spider attr (such as name
) or time
are available. You can additionally
add any other attributes to the template by declaring the LOG_URI_PARAMS setting.
The LOG_URI_PARAMS settings should be a function, or a string that's a path to a function.
The function needs to take spider
as an argument and return a dictionary of the parameters.
LOG_URI_PARAMS: Optional[Union[str, Callable[[dict, Spider], dict]]] = {'my_attr': 'my_value'}
def uri_params_func(spider):
return {
'custom_param': 'my_value',
'another_param': 'another_value',
}
# takes the spider's name, the time the spider started, and the custom_param and another_param
LOG_URI = f"s3://your-bucket/%(name)s_%(time)s_%(custom_param)s_%(another_param)s.log"
LOG_URI_PARAMS = uri_params_func
Overriding feedexport settings
Because much of the backend is the same, you can override some feedexport settings, if you wish them to be different for logexport.
FeedExport | LogExport |
---|---|
FEED_STORAGE_S3_ACL | LOG_STORAGE_S3_ACL |
AWS_ENDPOINT_URL | LOG_STORAGE_AWS_ENDPOINT_URL |
GCS_PROJECT_ID | LOG_STORAGE_GCS_PROJECT_ID |
FEED_STORAGE_GCS_ACL | LOG_STORAGE_GCS_ACL |
FEED_STORAGE_FTP_ACTIVE | LOG_STORAGE_FTP_ACTIVE |
Additionally if there's shared keys in FEED_STORAGES and LOG_STORAGES, the LOG_STORAGES key will be used.
All possible settings
LOG_FILE # Required
LOG_URI # Required
LOG_EXPORTER_DELETE_LOCAL
LOG_URI_PARAMS
# Overrides for feedexport settings
LOG_STORAGES
LOG_STORAGE_S3_ACL
LOG_STORAGE_AWS_ENDPOINT_URL
LOG_STORAGE_GCS_PROJECT_ID
LOG_STORAGE_GCS_ACL
LOG_STORAGE_FTP_ACTIVE
# S3FeedStorage settings
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
FEEDEXPORT_S3_ACL # Overridden by LOG_STORAGE_S3_ACL
AWS_ENDPOINT_URL # Overridden by LOG_STORAGE_AWS_ENDPOINT_URL
# GCFeedStorage settings
GCS_PROJECT_ID # Overridden by LOG_STORAGE_GCS_PROJECT_ID
FEED_EXPORT_GCS_ACL # Overridden by LOG_STORAGE_GCS_ACL
# FTPFeedStorage settings
FEED_STORAGE_FTP_ACTIVE # Overridden by LOG_STORAGE_FTP_ACTIVE
FEED_TEMPDIR # Not used by logexport directly
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy_logexport-0.2.0.tar.gz
.
File metadata
- Download URL: scrapy_logexport-0.2.0.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.6 Linux/5.19.0-43-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 091e339e50f6a4a6f408a8fbcf0623128c9332a5bd277a36e96531972f160d54 |
|
MD5 | dbe60e3cb2bb9ff65e93282a858b524f |
|
BLAKE2b-256 | 35168013c6fb2703c49780ad7d71b52bc6dcbcdd798749c3f57a2207fbbfa90c |
File details
Details for the file scrapy_logexport-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: scrapy_logexport-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.6 Linux/5.19.0-43-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7978c905b2b492ab92fea789d6fbc4fd895d79fe14a3f69235b2f1f5bd548aa1 |
|
MD5 | b15e3afecfa18201a8c343ab68943983 |
|
BLAKE2b-256 | e56edc92c385defd34ee98296b7ea1a9c517e1603e95ca8a0fca4950a1f562bf |