Skip to main content

Console tools to download online novel and convert to text file

Project description

xsget

Console tools to download online novel and convert to text file.

Installation

Stable version From PyPI using uv:

uv tool install xsget playwright
playwright install

Upgrade to latest stable version:

uv tool upgrade xsget

Latest development version from GitHub:

python3 -m pip install -e git+https://github.com/kianmeng/xsget.git
playwright install

xsget

xsget -h
usage: xsget [-q] [-e] [-d] [-h] [-V] [-l CSS_PATH] [-P URL_PARAM]
             [-g [FILENAME] | -c [FILENAME]] [-r] [-t] [-b] [-bs SESSION]
             [-bd DELAY] [-od OUTPUT_DIR]
             URL

xsget is a console app that crawl and download online novel.

website: https://github.com/kianmeng/xsget
changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md
issues: https://github.com/kianmeng/xsget/issues

positional arguments:
  URL   set url of the index page to crawl

options:
  -q, --quiet
        suppress all logging
  -e, --env
        print environment information for bug reporting
  -d, --debug
        show debugging log and stacktrace
  -h, --help
        show this help message and exit
  -V, --version
        show program's version number and exit
  -l, --link-css-path CSS_PATH
        set css path of the link to a chapter (default: 'a')
  -P, --url-param-as-filename URL_PARAM
        use url param key as filename (default: '')
  -g, --generate-config-file [FILENAME]
        generate config file from options (default: 'xsget.toml')
  -c, --config-file [FILENAME]
        load config from file (default: 'xsget.toml')
  -r, --refresh
        refresh the index page
  -t, --test
        show extracted urls without crawling
  -b, --browser
        crawl by actual browser (default: 'False')
  -bs, --browser-session SESSION
        set the number of browser session (default: 2)
  -bd, --browser-delay DELAY
        set the second to wait for page to load in browser (default: 0)
  -od, --output-dir OUTPUT_DIR
        set default output folder (default: 'output')

examples:
  xsget http://localhost
  xsget http://localhost/page[1-100].html
  xsget -g -l "a" -p "id" http://localhost

xstxt

xstxt -h
usage: xstxt [-q] [-e] [-d] [-h] [-V] [-pt CSS_PATH] [-pb CSS_PATH]
             [-la LANGUAGE] [-ps SEPARATOR] [-rh REGEX REGEX]
             [-rt REGEX REGEX] [-bt TITLE] [-ba AUTHOR] [-ic INDENT_CHARS]
             [-fw] [-oi] [-ow] [-i GLOB_PATTERN] [-x GLOB_PATTERN]
             [-l TOTAL_FILES] [-w WIDTH] [-o FILENAME] [-od OUTPUT_DIR] [-y]
             [-p] [-g [FILENAME] | -c [FILENAME]] [-m]

xstxt is a console app that extract content from HTML to text file.

website: https://github.com/kianmeng/xsget
changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md
issues: https://github.com/kianmeng/xsget/issues

options:
  -q, --quiet
        suppress all logging
  -e, --env
        print environment information for bug reporting
  -d, --debug
        show debugging log and stacktrace
  -h, --help
        show this help message and exit
  -V, --version
        show program's version number and exit
  -pt, --title-css-path CSS_PATH
        set css path of chapter title (default: 'title')
  -pb, --body-css-path CSS_PATH
        set css path of chapter body (default: 'body')
  -la, --language LANGUAGE
        language of the ebook (default: 'zh')
  -ps, --paragraph-separator SEPARATOR
        set paragraph separator (default: '\n\n')
  -rh, --html-replace REGEX REGEX
        set regex to replace word or pharase in html file
  -rt, --txt-replace REGEX REGEX
        set regex to replace word or pharase in txt file
  -bt, --book-title TITLE
        set title of the novel (default: '不详')
  -ba, --book-author AUTHOR
        set author of the novel (default: '不详')
  -ic, --indent-chars INDENT_CHARS
        set indent characters for a paragraph (default: '')
  -fw, --fullwidth
        convert ASCII character to from halfwidth to fullwidth (default: 'False')
  -oi, --output-individual-file
        convert each html file into own txt file
  -ow, --overwrite
        overwrite output file
  -i, --input GLOB_PATTERN
        set glob pattern of html files to process (default: '['./*.html']')
  -x, --exclude GLOB_PATTERN
        set glob pattern of html files to exclude (default: '[]')
  -l, --limit TOTAL_FILES
        set number of html files to process (default: '3')
  -w, --width WIDTH
        set the line width for wrapping (default: 0, 0 to disable)
  -o, --output FILENAME
        set output txt file name (default: 'book.txt')
  -od, --output-dir OUTPUT_DIR
        set output directory (default: 'output')
  -y, --yes
        yes to prompt
  -p, --purge
        remove extracted files specified by --output-folder option (default: 'False')
  -g, --generate-config-file [FILENAME]
        generate config file from options (default: 'xstxt.toml')
  -c, --config-file [FILENAME]
        load config from file (default: 'xstxt.toml')
  -m, --monitor
        monitor config file changes and re-run when needed

examples:
  xsget -g
  xstxt --input *.html
  xstxt --output-individual-file --input *.html
  xstxt --config --monitor

Copyright and License

Copyright (C) 2021,2022,2023,2024,2025,2026 Kian-Meng Ang

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xsget-0.1.83.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xsget-0.1.83-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file xsget-0.1.83.tar.gz.

File metadata

  • Download URL: xsget-0.1.83.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for xsget-0.1.83.tar.gz
Algorithm Hash digest
SHA256 87935461f0590b9c39bb73153a9ec9c882f5331a4d3c5589394c0512c3427b77
MD5 d788f87722cf8d739788185cc9968150
BLAKE2b-256 4d53cf90ab8529748543e42e226a0f91b5db54595c9bb46a41f7612b5456a2e6

See more details on using hashes here.

File details

Details for the file xsget-0.1.83-py3-none-any.whl.

File metadata

  • Download URL: xsget-0.1.83-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for xsget-0.1.83-py3-none-any.whl
Algorithm Hash digest
SHA256 b69f8ddcf9d881e8cd544f388f369166ffb9394888f9bfcc787d5d33d26b4db6
MD5 80c67a25d2f70c3fa66e4e8bf797d768
BLAKE2b-256 3545a974bb2d56f6ede4255896d07bfd7e93981e39848ce61809a32a378de5fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page