Skip to main content

A command-line tool to convert documents and web pages to text.

Project description

README for src2txt

Overview

src2txt is a command-line tool designed to convert various types of documents and web pages into plain text. It supports processing local files (including PDFs and HTML files) and web URLs. The tool provides options for recursive directory processing, inclusion of hidden files, and can respect .gitignore rules.

Features

  • Convert files and URLs to text.
  • Support for multiple file types including text, HTML, and PDF.
  • Recursive processing of directories.
  • Options to include hidden files and ignore .gitignore rules.
  • Verbose output and listing files without processing.

Installation

Using pipx

The recommended method to install src2txt is using pipx, which installs Python applications in isolated environments. To install src2txt using pipx, run:

pipx install src2txt

If pipx is not installed, you can install it via pip:

pip install pipx
pipx ensurepath

Usage

To use src2txt, you can invoke it from the command line with various options. Here’s a basic example:

src2txt src_to_text --sources "path/to/file_or_directory" --recursive

Command Line Options

  • --sources: List of file paths or URLs.
  • --raw-html: Return raw HTML content without cleaning.
  • --recursive: Recursively process files in directories.
  • --include-hidden: Include hidden files in the processing.
  • --ignore-gitignore: Ignore .gitignore rules and include files normally ignored.
  • --verbose: Print verbose output.
  • --file-name: Print the file name or URL before the content.
  • --list: List files and URLs without processing content.

For more detailed usage and options, refer to the help provided by the tool:

src2txt --help

Contributing

Contributions to src2txt are welcome.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

src2txt-0.5.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

src2txt-0.5.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file src2txt-0.5.0.tar.gz.

File metadata

  • Download URL: src2txt-0.5.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for src2txt-0.5.0.tar.gz
Algorithm Hash digest
SHA256 1099f0e49ad5d5e964c766e019b870e0d192cba1bc473b146b125942fdfae478
MD5 3f3b9edad03ca68afd99e4f42da741ec
BLAKE2b-256 b4621838e6d6190387579d42c7a5bac8b0d017f1843855e69230ea3ef29b6fb4

See more details on using hashes here.

File details

Details for the file src2txt-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: src2txt-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for src2txt-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 855d55a73b864d75cce2fa2f9a6a8a540fb3306bf7ed84a575b55ecbd5c9619b
MD5 5fe214812ad87d886dd1e2bec0b7a674
BLAKE2b-256 f38c0ae8b037257dc3db8182310bcadefd2ddca0ce8acc63766a91b0b2529831

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page