Wrapper for the Trombone project
Project description
pytrombone
Python wrapper for the Trombone project
https://github.com/voyanttools/trombone
Installation
$ pip install pytrombone
Usage
Examples
Consider a situation where we have a bunch of pdfs in a directory named './data/', and we want to calculate the SMOG index on those PDFs.
Making sure that Trombone works
from pytrombone import Trombone, Cache, filepaths_loader
# This will download the Trombone jar in the /tmp/ directory of your machine.
# Note that Trombone is likely to be deleted on reboot, and will need to be downloaded again.
trombone = Trombone()
# To get the version
version = trombone.get_version()
print(version)
Calculating the SMOG index of 2 files
# To run Trombone on a single file use the run method.
# Note that Trombone parameters are given in the form of a list of tuple of 2 elements.
# The first element of the tuple is the parameter, and the second is its value.
# Also note that Trombone will handle those 2 files concurrently
# (it will be more performant to give many files at the same time rather than loop on each).
output, error = trombone.run([
('tool', 'corpus.DocumentSMOGIndex'), # Choose the tool you want to use
('file', './data/example1.pdf'),
('file', './data/example2.pdf'),
('storage', 'file'), # Optional, it allows Trombone to cache pre-processed files (use if you will use the file for many tools)
])
output # is the successful output of Trombone, in the form of a string
error # is the failed output of Trombone, in the form of a string
# You can serialize the output, which has the JSON format :
output = trombone.serialize_output(output)
# output is now your results in the form of a dictionary
Calculating the SMOG index in batches
# We first need to setup the cache file (it will allow you to re-run
# your code in case of a problem without having to restart from the beginning)
cache = Cache('./cache.db')
# Then, load the filepaths in batch. pytrombone has a function to do that.
# Note that every file marked as processed will be ignored.
# Also note that the Cache uses the filename of the file as reference.
for filepaths in filepaths_loader('./data/*.pdf', 100, cache):
# Making tuples to fit the specification of Trombone parameters
files = [('file', filepath) for filepath in filepaths]
output, error = trombone.run([
('tool', 'corpus.DocumentSMOGIndex'), # Choose the tool you want to use
('storage', 'file'), # Optional, it allows Trombone to cache pre-processed files (use if you will use the file for many tools)
] + files)
try:
# If the serialization failed, it is because Trombone failed to performs the analysis.
# the failed files will be marked as failed in the cache and re-run on the next run.
# You may want to inspect the "error" variable for more information.
output = trombone.serialize_output(output)
except json.JSONDecoder:
filenames = [os.path.basename(filepath) for filepath in filepaths]
cache.mark_as_failed(filenames)
continue
output # now has your results in the for of a dictionary
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytrombone-0.1.3.tar.gz
(5.3 kB
view details)
Built Distribution
File details
Details for the file pytrombone-0.1.3.tar.gz
.
File metadata
- Download URL: pytrombone-0.1.3.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.10.4 Linux/5.17.5-200.fc35.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7333b296322a7379a4adecbc4d354d987764bc72a6e014568c02d5a3a5d7ecb5 |
|
MD5 | a1918d5221c3c4851278cad324b70a99 |
|
BLAKE2b-256 | 2a154491e6f04a647f5cbba9d20f521f8cdfe893c05f89be724c1582fa60750e |
File details
Details for the file pytrombone-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: pytrombone-0.1.3-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.10.4 Linux/5.17.5-200.fc35.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e98c901d5f6a15c9c1dc12abeca5b10a0ca854e5dee031b0db5e42a883d583e3 |
|
MD5 | 20c19dda5de145074962bffb4a6ed10b |
|
BLAKE2b-256 | 7c89a1e78655697794d8fc3632c33df55a7e73bd059fdf58cbd92641360226e5 |