Invenio module to interact with Grobid API for metadata extraction.
Project description
Invenio module to interact with Grobid API for metadata extraction from PDF.
Free software: GPLv2 license
Documentation: https://invenio-grobid.readthedocs.org.
This is an experimental developer preview release.
Features
This module provide an interface for uploading PDFs to a Grobid instance and allows to submit extracted metadata to a configurable callback.
NOTE: This packages assumes you have setup a local Grobid REST service. For more information about this and more, read the official Grobid documentation.
Installation
pip install invenio-grobid
Note that you also need a running Grobid REST service.
Configuration
Add invenio_grobid package to your Invenio PACKAGES config in your overlay/config.py to be picked up by the Invenio application loader.
Configure the URL to your Grobid REST service with GROBID_HOST.
inveniomanage config set GROBID_HOST 'http://localhost:8080'
If you want to change your standard upload handler after extraction, update GROBID_RESULT_HANDLER.
inveniomanage config set GROBID_RESULT_HANDLER 'my_overlay.grobid:upload_handler'
Usage
The uploader interface is available under the /grobid endpoint by default. E.g. http://localhost:4000/grobid
Choose a PDF to extract metadata from and hit Upload.
Wait a bit and metadata will be displayed.
Click on Submit button to push the metadata to your GROBID_RESULT_HANDLER
Special thanks to Joseph Boyd (@jcboyd) and Gilles Louppe (@glouppe) for Grobid support.
Happy hacking and thanks for flying Invenio Grobid.
Changes
Version 0.1.0 (released 2015-10-09)
Initial public release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.