Skip to main content

A scrapy pipeline which stores files using folder trees.

Project description

scrapy-folder-tree

build PyPI GitHub license PyPI - Format PyPI - Status Code style: black Checked with mypy Imports: isort

This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.

Supported folder structures:

Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg, you can choose the following folder structures:

Using file name

class: scrapy-folder-tree.ImagesHashTreePipeline

full
├── 0
.   ├── 5
.   .   ├── b
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using crawling time

class: scrapy-folder-tree.ImagesTimeTreePipeline

full
├── 0
.   ├── 11
.   .   ├── 48
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using crawling date

class: scrapy-folder-tree.ImagesDateTreePipeline

full
├── 2022
.   ├── 1
.   .   ├── 24
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg

Installation

pip install scrapy_folder_tree

Usage

Use the following settings in your project:

ITEM_PIPELINES = {
    'scrapy_folder_tree.FilesHashTreePipeline': 300
}

FOLDER_TREE_DEPTH = 3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-folder-tree-0.1.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

scrapy_folder_tree-0.1.3-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-folder-tree-0.1.3.tar.gz.

File metadata

  • Download URL: scrapy-folder-tree-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.1 Linux/5.13.19-2-MANJARO

File hashes

Hashes for scrapy-folder-tree-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5568ea3ec11aae42155e08cdb585516292779139216640cadfcb59759448252d
MD5 ac2df3c865c5477548b26092dda441cc
BLAKE2b-256 06790ab65d7b75e6848b80ebd9b2ca4b09a6a439d1a22458d6ff9bbb6f4c401a

See more details on using hashes here.

File details

Details for the file scrapy_folder_tree-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: scrapy_folder_tree-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.1 Linux/5.13.19-2-MANJARO

File hashes

Hashes for scrapy_folder_tree-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a407ffc0ba26d66cbb870880a1fe90a59e8e8a2aec83e55f8a7cd24b40d83f63
MD5 4de8133bf21ff78c720243ac088ba76c
BLAKE2b-256 7263ee311de80558f06c38ab4132bf3ffdd26b781a5bbdd049955fe59f4a6aab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page