Skip to main content

Console tools to download online novel and convert to text file.

Project description

xsget

Console tools to download online novel and convert to text file.

Installation

Stable version From PyPI:

python3 -m pip install xsget

Latest development version from GitHub:

python3 -m pip install -e git+https://github.com/kianmeng/xsget.git

xsget

$ xsget -h
usage: xsget [-l CSS_PATH] [-p URL_PARAM] [-g [FILENAME]] [-c [FILENAME]] [-r]
             [-t] [-d] [-h] [-v]
             URL

crawl and download online novel as html files

positional arguments:
  URL            set url of the index page to crawl

optional arguments:
  -l CSS_PATH    set css path of the link to a chapter (default: a)
  -p URL_PARAM   use url param key as filename (default: )
  -g [FILENAME]  generate config file from options (default: False)
  -c [FILENAME]  load config from file (default: False)
  -r             refresh the index page (default: False)
  -t             show extracted urls without crawling (default: False)
  -d             show debugging log (default: False)
  -h             show this help message and exit
  -v             show program's version number and exit

examples:
  xsget http://localhost

xstxt

$ xstxt -h
usage: xstxt [-pt CSS_PATH] [-pb CSS_PATH] [-rh REGEX REGEX] [-rt REGEX REGEX]
             [-bt TITLE] [-ba AUTHOR] [-i GLOB_PATTERN] [-e GLOB_PATTERN]
             [-l TOTAL_FILES] [-o FILENAME] [-g [FILENAME]] [-c [FILENAME]]
             [-d] [-h] [-v]

massage and convert online novel from HTML files to text file

optional arguments:
  -pt CSS_PATH     set css path of chapter title (default: title)
  -pb CSS_PATH     set css path of chapter body (default: body)
  -rh REGEX REGEX  set regex to replace word or pharase in html file (default: [])
  -rt REGEX REGEX  set regex to replace word or pharase in txt file (default: [])
  -bt TITLE        set title of the novel (default: 不详)
  -ba AUTHOR       set author of the novel (default: 不详)
  -i GLOB_PATTERN  set glob pattern of html files to process (default: ['./*.html'])
  -e GLOB_PATTERN  set glob pattern of html files to exclude (default: [])
  -l TOTAL_FILES   set number of html files to process (default: 3)
  -o FILENAME      set output txt file name (default: book.txt)
  -g [FILENAME]    generate config file from options (default: False)
  -c [FILENAME]    set config file to use if found (default: False)
  -d               show debugging log (default: False)
  -h               show this help message and exit
  -v               show program's version number and exit

examples:
  xstxt -i *.html

Copyright and License

Copyright (C) 2021,2022 Kian-Meng Ang

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xsget-0.1.2.tar.gz (95.9 kB view hashes)

Uploaded Source

Built Distribution

xsget-0.1.2-py3-none-any.whl (24.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page