Knowledge base construction system for richly formatted data.
Project description
Fonduer is a framework for building knowledge base construction (KBC) applications from richy formatted data and is implemented as a library on top of a modified version of Snorkel.
Note that Fonduer is still actively under development, so feedback and contributions are welcome. Let us know in the `Issues <https://github.com/HazyResearch/fonduer/issues>`__ section or feel free to submit your contributions as a pull request.
Reference
`Fonduer: Knowledge Base Construction from Richly Formatted Data <https://arxiv.org/abs/1703.05028>`__
@article{wu2017fonduer, title={Fonduer: Knowledge Base Construction from Richly Formatted Data}, author={Wu, Sen and Hsiao, Luke and Cheng, Xiao and Hancock, Braden and Rekatsinas, Theodoros and Levis, Philip and R{\'e}, Christopher}, journal={arXiv preprint arXiv:1703.05028}, year={2017} }
Installation
Dependencies
We use a few applications that you’ll need to install and be sure are on your PATH.
For OS X using homebrew:
brew install poppler
brew install postgresql
On Debian-based distros:
sudo apt-get install poppler-utils
sudo apt-get install postgresql
For the Python dependencies, we recommend using a virtualenv. Once you have cloned the repository, change directories to the root of the repository and run
virtualenv -p python3 .venv
Once the virtual environment is created, activate it by running
source .venv/bin/activate
Any Python libraries installed will now be contained within this virtual environment. To deactivate the environment, simply run deactivate.
Fonduer adds some additional python packages to the default Snorkel installation which can be installed using pip:
pip install -r python-package-requirement.txt
Running
After installing Fonduer, and the additional python dependencies, just run:
./run.sh
which will finish installing the external libraries we use.
Learning how to use Fonduer
The `Fonduer tutorials <https://github.com/hazyresearch/fonduer/tree/master/tutorials>`__ cover the Fonduer workflow, showing how to extract relations from hardware datasheets and scientific literature.
The tutorials are available in the following directory:
tutorials/
For Developers
Testing
You can run unit tests locally by running
source ./set_env.sh pytest tests -rsXx
FAQs
How do I connect to PostgreSQL? I’m getting “fe_sendauth no password supplied”.
There are four main ways to deal with entering passwords when you connect to your PostgreSQL database:
Set the PGPASSWORD environment variable PGPASSWORD=<pass> psql -h <host> -U <user>
Using a .pgpass file to store the password.
Setting the users to trust authentication in the pg_hba.conf file. This makes local development easy, but probably isn’t suitable for multiuser environments. You can find your hba file location by running psql, then querying SHOW hba_file;
Put the username and password in the connection URI: postgres://user:pw@localhost:5432/...
I’m getting a CalledProcessError for command ‘pdftotext -f 1 -l 1 -bbox-layout’?
Are you using Ubuntu 14.04 (or older)? Fonduer requires poppler-utils to be version ``0.36.0` or greater <https://poppler.freedesktop.org/releases.html>`__. Otherwise, the -bbox-layout option is not available for pdftotext.
If you must use Ubuntu 14.04, you can install manually. As an example, to install 0.53.0:
sudo apt-get install build-essential checkinstall
wget poppler.freedesktop.org/poppler-0.53.0.tar.xz
tar -xf ./poppler-0.53.0.tar.xz
cd poppler-0.53.0
./configure
make
sudo checkinstall
We highly recommend using at least Ubuntu 16.04 though, as we haven’t done testing on 14.04 or older.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.