data mining tool for extract text for files
Project description
aTXT
A Data Mining Tool For Extract Text From Files.
Meta
Author: Jonathan S. Prieto C.
Email: prieto.jona@gmail.com
Notes: Have feedback? Please send me an email.
Free software: BSD license
Requirements
This software is available thanks to others open sources projects. The following list itemizes some of those:
PySide (GUI lib)
Tessaract OCR
Xpdf
lxml (doc files)
scandir (trasversal folders fast)
docx
Installation
pip install atxt
Check dependencies for avoiding surprises:
aTXT --check
Show help for command line: :
aTXT -h
Usage
In every case, you can use aTXT with his name package or more easy: :
2txt -h
You can use the graphical interface (if you have installed PySide):
aTXT -i
You should something like this:
Note: aTXT will always generate a FILE for each file path.
Examples: :
$ 2txt prueba.html $ 2txt prueba.html -o $ 2txt --file ~/Documents/prueba.html $ 2txt --file ~/Documents/prueba.html --to ~/htmls
Searching all textable files in a level-2 of depth over ~: :
$ 2txt ~ -d 2 $ 2txt --path ~ -d 2 --format 'txt,html'
Problems, Bugs? ————Please be free to comment whatever issue or problem with the installation. : .. _Issues: http://github.com/d555/python-atxt/issues
Changelog
1.0.5.3 (2015-07-03)
“fix bugs suggested by landscape.io”
1.0.5.2 (2015-07-02)
This version is more relate with Windows support:
support for .doc on windows with antiword
rewrite method to walk on directory based on scandir and modifacation of it.
fix a bug on windows when it tried to perform some search on depth
fix a bug with workers/run_path.py that caused two calls for these method
fix bugs with the fucking encoding and the arguments to subprocess
fix bug of logging and colors on terminal of windows.
1.0.5.1 (2015-06-30)
fix some bugs with gui
1.0.5 (2015-03-01)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file aTXT-1.0.5.3.tar.gz
.
File metadata
- Download URL: aTXT-1.0.5.3.tar.gz
- Upload date:
- Size: 3.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
9fe8f17ad3c2679750cfce1a273ad36911fcc490560cdb3854a17848562508c4
|
|
MD5 |
5cf11f1ad0f67853b56d377b6e9abaf0
|
|
BLAKE2b-256 |
8a0958b66d815beccb0d55f88ec57c2d2733a6e060518d17ab76588987831ba6
|