A simple utility for crawling text from 2ch
Project description
much
A simple utility for crawling text from 2ch
Usage
The command pull
requires two attributes - url of the web page to fetch and path to output file with json
or txt
extension depending on required output file format. For example:
python -m much pull https://2ch.hk/b/arch/2018-08-22/res/181770037.html assets/stories.txt
To fetch archived threads on 17
th page:
python -m much fetch 17
To list top 10
fetched threads by size (cumulative number of characters in messages longer than 100 symbols):
python -m much top 10
To star a thread (copy it to folder assets/starred
with a given name):
python -m much star 263473351 discussion
Installation
To install through pip:
pip install much
To install dependencies and create conda environment:
conda env create -f environment.yml
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
much-0.0.7.tar.gz
(10.3 kB
view details)
File details
Details for the file much-0.0.7.tar.gz
.
File metadata
- Download URL: much-0.0.7.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b20fce3387addb839dfe205dc4ee0f4aa2a76329304715adb81a07bb5b0694d1 |
|
MD5 | 5eb1559b4ca84ee5d260f117e607e6ab |
|
BLAKE2b-256 | ecbe4d365446d8d1f0564e141fa9aa4b5f595450a15b3a8860ffd6fa7bcf82e0 |