data mining tool for extract text for files
Project description
A Data Mining Tool For Extract Text From Files.
Meta
Author: Jonathan S. Prieto C.
Email: prieto.jona@gmail.com
Notes: Have feedback? Please send me an email.
Free software: BSD license
Requirements
This software is available thanks to others open sources projects. The following list itemizes some of those:
PySide (GUI lib)
Tessaract OCR
Xpdf
lxml (doc files)
scandir (trasversal folders fast)
docx
Installation
pip install atxt
Check dependencies for avoiding surprises:
aTXT --check
Show help for command line: :
aTXT -h
Usage
In every case, you can use aTXT with his name package or more easy: :
2txt -h
You can use the graphical interface (if you have installed PySide):
aTXT -i
You should something like this:
Note: aTXT will always generate a FILE for each file path.
Examples: :
$ 2txt prueba.html $ 2txt prueba.html -o $ 2txt --file ~/Documents/prueba.html $ 2txt --file ~/Documents/prueba.html --to ~/htmls
Searching all textable files in a level-2 of depth over ~: :
$ 2txt ~ -d 2 $ 2txt --path ~ -d 2 --format 'txt,html'
Problems, Bugs? ————Please be free to comment whatever issue or problem with the installation. : .. _Issues: http://github.com/d555/python-atxt/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.