No project description provided
Project description
Extractify
Extractify is a command-line tool for converting documents in various formats (.pdf, .doc, .docx, .xlsx, .txt) to plain text. The tool works with both local directories and S3 buckets. For local directories, the tool creates a 'txt' subdirectory within the specified input directory and saves the plain text files with the same filenames but with a .txt extension. For S3 buckets, it saves the plain text files in a 'txt' folder under the specified prefix.
Installation
Install Extractify using pip:
pip install extractify
Usage
Locally
To use Extractify with a local directory, run the following command:
extractify <input_dir>
(Replace <input_dir> with the path to the directory containing the documents you want to convert.)
In S3
To use Extractify with an S3 bucket, run the following command:
extractify s3://bucket-name/prefix
Replace bucket-name and prefix with the appropriate values for your S3 bucket.
Omit PDF files from the process
To omit PDF file formats, add the --omit-pdf flag:
extractify <input_dir or s3_bucket_address> --omit-pdf
Output
Extractify will create a 'txt' subdirectory within the input directory and save the plain text files there.
Supported Formats
Extractify currently supports the following document formats:
.pdf.doc.docx.xlsx.txt
Dependencies
Extractify requires the following Python libraries:
tikaopenpyxlargparsetqdmboto3
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file extractify-0.0.4.tar.gz.
File metadata
- Download URL: extractify-0.0.4.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48b00169fdd5b3283c922fbd10ca159607e53bf781c3c8c81c24aa588abb958f
|
|
| MD5 |
017ae2a8da2adf5e3d10ef7ee9efd0c5
|
|
| BLAKE2b-256 |
c0b01ff6c2bbd210c4695b7e9e7b70969ec755cd0d71abc9bb603c461d59e714
|
File details
Details for the file extractify-0.0.4-py3-none-any.whl.
File metadata
- Download URL: extractify-0.0.4-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf098df29e7934f40b040d3e3c3e46f69de0754c7b87a8e2a382f2f3bad7d0b3
|
|
| MD5 |
906dc8a4bebc8ee996909f73b4776966
|
|
| BLAKE2b-256 |
6f9a838639b3f285b2d3d90ab7924f1bf16faebecf1aded17ed6de9376e651f0
|