Skip to main content

Wrapper for the Trombone project

Project description

pytrombone

Python wrapper for the Trombone project

https://github.com/voyanttools/trombone

Installation

$ pip install pytrombone

Usage

Examples

Consider a situation where we have a bunch of pdfs in a directory named './data/', and we want to calculate the SMOG index on those PDFs.

Making sure that Trombone works

from pytrombone import Trombone, Cache, filepaths_loader

# This will download the Trombone jar in the /tmp/ directory of your machine. 
# Note that Trombone is likely to be deleted on reboot, and will need to be downloaded again.
trombone = Trombone()

# To get the version
version = trombone.get_version()
print(version)

Calculating the SMOG index of 2 files

# To run Trombone on a single file use the run method.
# Note that Trombone parameters are given in the form of a list of tuple of 2 elements.
# The first element of the tuple is the parameter, and the second is its value.
# Also note that Trombone will handle those 2 files concurrently 
# (it will be more performant to give many files at the same time rather than loop on each).
output, error = trombone.run([
    ('tool', 'corpus.DocumentSMOGIndex'),  # Choose the tool you want to use
    ('file', './data/example1.pdf'),
    ('file', './data/example2.pdf'),
    ('storage', 'file'),  # Optional, it allows Trombone to cache pre-processed files (use if you will use the file for many tools)
])
output  # is the successful output of Trombone, in the form of a string
error  # is the failed output of Trombone, in the form of a string

# You can serialize the output, which has the JSON format :
output = trombone.serialize_output(output)
# output is now your results in the form of a dictionary

Calculating the SMOG index in batches

# We first need to setup the cache file (it will allow you to re-run
# your code in case of a problem without having to restart from the beginning)
cache = Cache('./cache.db')

# Then, load the filepaths in batch. pytrombone has a function to do that.
# Note that every file marked as processed will be ignored.
# Also note that the Cache uses the filename of the file as reference.
for filepaths in filepaths_loader('./data/*.pdf', 100, cache):
    # Making tuples to fit the specification of Trombone parameters
    files = [('file', filepath) for filepath in filepaths]

    output, error = trombone.run([
        ('tool', 'corpus.DocumentSMOGIndex'),  # Choose the tool you want to use
        ('storage', 'file'),  # Optional, it allows Trombone to cache pre-processed files (use if you will use the file for many tools)
    ] + files)

    try:
        # If the serialization failed, it is because Trombone failed to performs the analysis.
        # the failed files will be marked as failed in the cache and re-run on the next run.
        # You may want to inspect the "error" variable for more information.
        output = trombone.serialize_output(output)
    except json.JSONDecoder:
        filenames = [os.path.basename(filepath) for filepath in filepaths]
        cache.mark_as_failed(filenames)
        continue

    output  # now has your results in the for of a dictionary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytrombone-0.1.3.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

pytrombone-0.1.3-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file pytrombone-0.1.3.tar.gz.

File metadata

  • Download URL: pytrombone-0.1.3.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.4 Linux/5.17.5-200.fc35.x86_64

File hashes

Hashes for pytrombone-0.1.3.tar.gz
Algorithm Hash digest
SHA256 7333b296322a7379a4adecbc4d354d987764bc72a6e014568c02d5a3a5d7ecb5
MD5 a1918d5221c3c4851278cad324b70a99
BLAKE2b-256 2a154491e6f04a647f5cbba9d20f521f8cdfe893c05f89be724c1582fa60750e

See more details on using hashes here.

File details

Details for the file pytrombone-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pytrombone-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.4 Linux/5.17.5-200.fc35.x86_64

File hashes

Hashes for pytrombone-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e98c901d5f6a15c9c1dc12abeca5b10a0ca854e5dee031b0db5e42a883d583e3
MD5 20c19dda5de145074962bffb4a6ed10b
BLAKE2b-256 7c89a1e78655697794d8fc3632c33df55a7e73bd059fdf58cbd92641360226e5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page