Skip to main content

A simple utility for crawling text from 2ch

Project description

much

A simple utility for crawling text from 2ch

Usage

The command pull requires two attributes - url of the web page to fetch and path to output file with json or txt extension depending on required output file format. For example:

python -m much pull https://2ch.hk/b/arch/2018-08-22/res/181770037.html assets/stories.txt

To fetch archived threads on 17th page:

python -m much fetch 17

To list top 10 fetched threads by size (cumulative number of characters in messages longer than 100 symbols):

python -m much top 10

To star a thread (copy it to folder assets/starred with a given name):

python -m much star 263473351 discussion

Installation

To install through pip:

pip install much

To install dependencies and create conda environment:

conda env create -f environment.yml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

much-0.0.6.tar.gz (162.8 kB view details)

Uploaded Source

File details

Details for the file much-0.0.6.tar.gz.

File metadata

  • Download URL: much-0.0.6.tar.gz
  • Upload date:
  • Size: 162.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for much-0.0.6.tar.gz
Algorithm Hash digest
SHA256 d384023c19fe229f57e5bec5280a5ad900d12f68a1f6b1073d865b785ea7a02f
MD5 a6704360a2ab87aab297a36d7d19eabc
BLAKE2b-256 8668ca8dc7e6b0723fb957165e39b82a71b0e3607c747f05933c25a86c4e2793

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page