Skip to main content

A minimal client for grobid-quantities service.

Project description

http://img.shields.io/:license-apache-blue.svg https://travis-ci.org/hirmeos/entity-fishing-client-python.svg?branch=master

Python client to query the Grobid Quantities service API For more information about Grobid Quantities, please check the Grobid Quantities Documentation.

Installation

The client can be installed using pip:

pip install grobid-quantities-python

Usage

Process Text / PDF

from grobid_quantities.quantities import QuantitiesClient
client = QuantitiesClient(apiBase=server_url)

To process raw text:

client.process_text(
    "I lost two minutes"
)

To process PDF

client.process_pdf(pdfFile)

To parse the measurements

client.parse_measures("from": "10", "to": "20", "unit": "km")

The response is a tuple where the first element is the status code and and the second element the response body as a dictionary. Here an example:

 (
     200,
     {
       "runtime": 123,
       "measurements": [
         {
           "type": "value",
           "quantity": {
             "type": "time",
             "rawValue": "two",
             "rawUnit": {
               "name": "minutes",
               "type": "time",
               "system": "non SI",
               "offsetStart": 11,
               "offsetEnd": 18
             },
             "parsedValue": {
               "numeric": 2,
               "structure": {
                 "type": "ALPHABETIC",
                 "formatted": "two"
               },
               "parsed": "two"
             },
             "normalizedQuantity": 120,
             "normalizedUnit": {
               "name": "s",
               "type": "time",
               "system": "SI base"
             },
             "offsetStart": 7,
             "offsetEnd": 11
           }
         }
       ]
     }
)

Batch processing

The batch processing is implemented in the class QuantitiesBatch. The class can be instantiated by defining the entity-fishing url in the constructor, else the default one is used.

To run the processing, the method process requires the input directory, a callback and the number of threads/processes. There is an already ready implementation in script/batchSample.py.

To run it:
  • under this work branch, prepare two folders: input which containing the input PDF files to be processed and output which collecting the processing result
  • we recommend to create a new virtualenv, activate it and install all the requirements needed in this virtual environment using $ pip install -r /path/of/grobid-quantities-python-client/source/requirements.txt
  • (temporarly, until this branch is not merged) install entity-fishing multithread branch in edit mode (pip install -e /path/of/client-python/source)
  • run it with python runFile.py input output 5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grobid-quantities-client-0.2.1.tar.gz (6.0 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page