Skip to main content

Knowledge base construction system for richly formatted data.

Project description

GitHub license GitHub stars PyPI PyPI - Python Version GitHub issues Travis Coveralls github

Fonduer is a framework for building knowledge base construction (KBC) applications from richy formatted data and is implemented as a library on top of a modified version of Snorkel.

Note that Fonduer is still actively under development, so feedback and contributions are welcome. Let us know in the `Issues <https://github.com/HazyResearch/fonduer/issues>`__ section or feel free to submit your contributions as a pull request.

Reference

`Fonduer: Knowledge Base Construction from Richly Formatted Data <https://arxiv.org/abs/1703.05028>`__

@article{wu2017fonduer,
  title={Fonduer: Knowledge Base Construction from Richly Formatted Data},
  author={Wu, Sen and Hsiao, Luke and Cheng, Xiao and Hancock, Braden and Rekatsinas, Theodoros and Levis, Philip and R{\'e}, Christopher},
  journal={arXiv preprint arXiv:1703.05028},
  year={2017}
}

Installation

Dependencies

We use a few applications that you’ll need to install and be sure are on your PATH.

For OS X using homebrew:

brew install poppler
brew install postgresql

On Debian-based distros:

sudo apt-get install poppler-utils
sudo apt-get install postgresql

For the Python dependencies, we recommend using a virtualenv. Once you have cloned the repository, change directories to the root of the repository and run

virtualenv -p python3 .venv

Once the virtual environment is created, activate it by running

source .venv/bin/activate

Any Python libraries installed will now be contained within this virtual environment. To deactivate the environment, simply run deactivate.

Fonduer adds some additional python packages to the default Snorkel installation which can be installed using pip:

pip install -r python-package-requirement.txt

Running

After installing Fonduer, and the additional python dependencies, just run:

./run.sh

which will finish installing the external libraries we use.

Learning how to use Fonduer

The `Fonduer tutorials <https://github.com/hazyresearch/fonduer/tree/master/tutorials>`__ cover the Fonduer workflow, showing how to extract relations from hardware datasheets and scientific literature.

The tutorials are available in the following directory:

tutorials/

For Developers

Testing

You can run unit tests locally by running

source ./set_env.sh
pytest tests -rsXx

FAQs

How do I connect to PostgreSQL? I’m getting “fe_sendauth no password supplied”.

There are four main ways to deal with entering passwords when you connect to your PostgreSQL database:

  1. Set the PGPASSWORD environment variable PGPASSWORD=<pass> psql -h <host> -U <user>

  2. Using a .pgpass file to store the password.

  3. Setting the users to trust authentication in the pg_hba.conf file. This makes local development easy, but probably isn’t suitable for multiuser environments. You can find your hba file location by running psql, then querying SHOW hba_file;

  4. Put the username and password in the connection URI: postgres://user:pw@localhost:5432/...

I’m getting a CalledProcessError for command ‘pdftotext -f 1 -l 1 -bbox-layout’?

Are you using Ubuntu 14.04 (or older)? Fonduer requires poppler-utils to be version ``0.36.0` or greater <https://poppler.freedesktop.org/releases.html>`__. Otherwise, the -bbox-layout option is not available for pdftotext.

If you must use Ubuntu 14.04, you can install manually. As an example, to install 0.53.0:

sudo apt-get install build-essential checkinstall
wget poppler.freedesktop.org/poppler-0.53.0.tar.xz
tar -xf ./poppler-0.53.0.tar.xz
cd poppler-0.53.0
./configure
make
sudo checkinstall

We highly recommend using at least Ubuntu 16.04 though, as we haven’t done testing on 14.04 or older.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fonduer-0.1.3.tar.gz (105.0 kB view hashes)

Uploaded Source

Built Distribution

fonduer-0.1.3-py3-none-any.whl (134.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page