A lightweight tool for parsing and downloading 4chan threads.
Project description
A lightweight tool for parsing and downloading 4chan threads.
Features
A comprehensive API for programmatically analysing 4chan content.
Concurrent downloading, with parallelism linked to the number of available cores.
Override the file naming scheme and specify exclusions for thread downloads.
Filter files by extension or category (e.g. images, videos).
Installation
To install chandl, simply run:
$ pip install chandl
Examples
Download all files in <thread_url>, to a new directory named after the thread if possible, otherwise its raw id:
$ chandl <thread_url>
Download all images and .webm files in <thread_url> to /dev/shm, using 3 download threads per core:
$ chandl -f images,webm -o /dev/shm -p 3 <thread_url>
Download all files in <thread_url>, except abc.jpg and def.jpg to the present working directory, using a custom name format:
$ chandl -e abc.jpg,def.jpg -t . -n "{board} - {file.name}.{file.extension}" <thread_url>
Usage
$ chandl -h usage: chandl [-h] [-V] [-v] [-f [FILTER]] [-e [EXCLUDE]] [-o [OUTPUT_DIR]] [-t [THREAD_DIR]] [-n [NAME]] [-p PARALLELISM] url A lightweight tool for parsing and downloading 4chan threads. positional arguments: url the URL of the thread to download optional arguments: -h, --help show this help message and exit -V, --version show program's version number and exit -v, --verbosity increase output verbosity -f [FILTER], --filter [FILTER] file types or extensions to download, value either comma-separated or option passed multiple times -e [EXCLUDE], --exclude [EXCLUDE] file names to exclude, value either comma-separated or option passed multiple times -o [OUTPUT_DIR], --output-dir [OUTPUT_DIR] the directory to create the `thread-dir` within -t [THREAD_DIR], --thread-dir [THREAD_DIR] relative to the `output-dir`, this will contain downloaded files -n [NAME], --name [NAME] the format to use for downloaded file names -p PARALLELISM, --parallelism PARALLELISM the maximum number of download threads to use per core
Roadmap
Implement tracking of threads until they are deleted
Improve test coverage
Pylint or flake8 integration
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file chandl-0.3.0.tar.gz
.
File metadata
- Download URL: chandl-0.3.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fcfb2edb525e8e456bce9b5f86432e692ce9ec277e7d8d505f7653a27d257bd |
|
MD5 | 5fa5974b7fff3575d748e10a41cc3644 |
|
BLAKE2b-256 | 6b902e4f095de1b90d448a1dd89dd4ef4b9abd766e460adbcd80e0344b671ba0 |