A command-line tool to convert documents and web pages to text.
Project description
README for src2txt
Overview
src2txt
is a command-line tool designed to convert various types of documents and web pages into plain text. It supports processing local files (including PDFs and HTML files) and web URLs. The tool provides options for recursive directory processing, inclusion of hidden files, and can respect .gitignore
rules.
Features
- Convert files and URLs to text.
- Support for multiple file types including text, HTML, and PDF.
- Recursive processing of directories.
- Options to include hidden files and ignore
.gitignore
rules. - Verbose output and listing files without processing.
Installation
Using pipx
The recommended method to install src2txt
is using pipx
, which installs Python applications in isolated environments. To install src2txt
using pipx
, run:
pipx install src2txt
If pipx
is not installed, you can install it via pip:
pip install pipx
pipx ensurepath
Usage
To use src2txt
, you can invoke it from the command line with various options. Here’s a basic example:
src2txt src_to_text --sources "path/to/file_or_directory" --recursive
Command Line Options
--sources
: List of file paths or URLs.--raw-html
: Return raw HTML content without cleaning.--recursive
: Recursively process files in directories.--include-hidden
: Include hidden files in the processing.--ignore-gitignore
: Ignore.gitignore
rules and include files normally ignored.--verbose
: Print verbose output.--file-name
: Print the file name or URL before the content.--list
: List files and URLs without processing content.
For more detailed usage and options, refer to the help provided by the tool:
src2txt --help
Contributing
Contributions to src2txt
are welcome.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file src2txt-0.5.0.tar.gz
.
File metadata
- Download URL: src2txt-0.5.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1099f0e49ad5d5e964c766e019b870e0d192cba1bc473b146b125942fdfae478 |
|
MD5 | 3f3b9edad03ca68afd99e4f42da741ec |
|
BLAKE2b-256 | b4621838e6d6190387579d42c7a5bac8b0d017f1843855e69230ea3ef29b6fb4 |
File details
Details for the file src2txt-0.5.0-py3-none-any.whl
.
File metadata
- Download URL: src2txt-0.5.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 855d55a73b864d75cce2fa2f9a6a8a540fb3306bf7ed84a575b55ecbd5c9619b |
|
MD5 | 5fe214812ad87d886dd1e2bec0b7a674 |
|
BLAKE2b-256 | f38c0ae8b037257dc3db8182310bcadefd2ddca0ce8acc63766a91b0b2529831 |