Invenio module to interact with Grobid API for metadata extraction.
Invenio module to interact with Grobid API for metadata extraction from PDF.
- Free software: GPLv2 license
- Documentation: https://invenio-grobid.readthedocs.org.
This is an experimental developer preview release.
This module provide an interface for uploading PDFs to a Grobid instance and allows to submit extracted metadata to a configurable callback.
NOTE: This packages assumes you have setup a local Grobid REST service. For more information about this and more, read the official Grobid documentation.
pip install invenio-grobid
Note that you also need a running Grobid REST service.
Add invenio_grobid package to your Invenio PACKAGES config in your overlay/config.py to be picked up by the Invenio application loader.
Configure the URL to your Grobid REST service with GROBID_HOST.
inveniomanage config set GROBID_HOST 'http://localhost:8080'
If you want to change your standard upload handler after extraction, update GROBID_RESULT_HANDLER.
inveniomanage config set GROBID_RESULT_HANDLER 'my_overlay.grobid:upload_handler'
The uploader interface is available under the /grobid endpoint by default. E.g. http://localhost:4000/grobid
- Choose a PDF to extract metadata from and hit Upload.
- Wait a bit and metadata will be displayed.
- Click on Submit button to push the metadata to your GROBID_RESULT_HANDLER
Special thanks to Joseph Boyd (@jcboyd) and Gilles Louppe (@glouppe) for Grobid support.
Happy hacking and thanks for flying Invenio Grobid.
Version 0.1.0 (released 2015-10-09)
- Initial public release.