Skip to main content

A minimal client for grobid-quantities service.

Project description

http://img.shields.io/:license-apache-blue.svg https://travis-ci.org/hirmeos/entity-fishing-client-python.svg?branch=master

Python client to query the Grobid Quantities service API For more information about Grobid Quantities, please check the Grobid Quantities Documentation.

Installation

The client can be installed using pip:

pip install grobid-quantities-python

Usage

Process Text / PDF

from grobid_quantities.quantities import QuantitiesClient
client = QuantitiesClient(apiBase=server_url)

To process raw text:

client.process_text(
    "I lost two minutes"
)

To process PDF

client.process_pdf(pdfFile)

To parse the measurements

client.parse_measures("from": "10", "to": "20", "unit": "km")

The response is a tuple where the first element is the status code and and the second element the response body as a dictionary. Here an example:

 (
     200,
     {
       "runtime": 123,
       "measurements": [
         {
           "type": "value",
           "quantity": {
             "type": "time",
             "rawValue": "two",
             "rawUnit": {
               "name": "minutes",
               "type": "time",
               "system": "non SI",
               "offsetStart": 11,
               "offsetEnd": 18
             },
             "parsedValue": {
               "numeric": 2,
               "structure": {
                 "type": "ALPHABETIC",
                 "formatted": "two"
               },
               "parsed": "two"
             },
             "normalizedQuantity": 120,
             "normalizedUnit": {
               "name": "s",
               "type": "time",
               "system": "SI base"
             },
             "offsetStart": 7,
             "offsetEnd": 11
           }
         }
       ]
     }
)

Batch processing

The batch processing is implemented in the class QuantitiesBatch. The class can be instantiated by defining the entity-fishing url in the constructor, else the default one is used.

To run the processing, the method process requires the input directory, a callback and the number of threads/processes. There is an already ready implementation in script/batchSample.py.

To run it:
  • under this work branch, prepare two folders: input which containing the input PDF files to be processed and output which collecting the processing result
  • we recommend to create a new virtualenv, activate it and install all the requirements needed in this virtual environment using $ pip install -r /path/of/grobid-quantities-python-client/source/requirements.txt
  • (temporarly, until this branch is not merged) install entity-fishing multithread branch in edit mode (pip install -e /path/of/client-python/source)
  • run it with python runFile.py input output 5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for grobid-quantities-client, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size grobid-quantities-client-0.2.1.tar.gz (6.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page