A simple pipeline for processing documents
Project description
docpipe
A simple document pipeline mechanism that makes it easier to process and clean up Word and other document types.
Other dependencies
This has some other non-Python dependencies for certain functionality:
- soffice (open office) for handling DOC and DOCX files
- pdftotext (poppler-utils) for extracting text from PDFs
Local development
- Clone this repo
- Setup a virtual environment
- Install dependencies:
pip install -e '.[test]'
- Run tests:
nosetests
License
Copyright 2022 Laws.Africa.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for docpipe-0.0.2.post1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a82d43df8b9bf8484fec836ffb3b86ebcf0e33d32febf8c2ecda472139a6b3d1 |
|
MD5 | 988162c71e469b5fdd0bc0661fdcf65c |
|
BLAKE2b-256 | d51774385b88257252c9cc66178d9fb7fea86e2648ef20e23ca4afa1ac30d214 |