Skip to main content

Knowledge base construction system for richly formatted data.

Project description

Fonduer
=======

|GitHub license| |GitHub stars| |PyPI| |PyPI - Python Version| |GitHub
issues| |Travis| |Coveralls github|

``Fonduer`` is a framework for building knowledge base construction
(KBC) applications from *richy formatted data* and is implemented as a
library on top of a modified version of
`Snorkel <https://hazyresearch.github.io/snorkel/>`__.

*Note that Fonduer is still actively under development, so feedback and
contributions are welcome. Let us know in the
`Issues <https://github.com/HazyResearch/fonduer/issues>`__ section or
feel free to submit your contributions as a pull request.*

Reference
---------

*`Fonduer: Knowledge Base Construction from Richly Formatted
Data <https://arxiv.org/abs/1703.05028>`__*

::

@article{wu2017fonduer,
title={Fonduer: Knowledge Base Construction from Richly Formatted Data},
author={Wu, Sen and Hsiao, Luke and Cheng, Xiao and Hancock, Braden and Rekatsinas, Theodoros and Levis, Philip and R{\'e}, Christopher},
journal={arXiv preprint arXiv:1703.05028},
year={2017}
}

Installation
------------

Dependencies
~~~~~~~~~~~~

We use a few applications that you'll need to install and be sure are on
your PATH.

For OS X using `homebrew <https://brew.sh>`__:

.. code:: bash

brew install poppler
brew install postgresql

On Debian-based distros:

.. code:: bash

sudo apt-get install poppler-utils
sudo apt-get install postgresql

For the Python dependencies, we recommend using a
`virtualenv <https://virtualenv.pypa.io/en/stable/>`__. Once you have
cloned the repository, change directories to the root of the repository
and run

.. code:: bash

virtualenv -p python3 .venv

Once the virtual environment is created, activate it by running

.. code:: bash

source .venv/bin/activate

Any Python libraries installed will now be contained within this virtual
environment. To deactivate the environment, simply run ``deactivate``.

``Fonduer`` adds some additional python packages to the default Snorkel
installation which can be installed using ``pip``:

.. code:: bash

pip install -r python-package-requirement.txt

Running
-------

After installing Fonduer, and the additional python dependencies, just
run:

::

./run.sh

which will finish installing the external libraries we use.

Learning how to use ``Fonduer``
-------------------------------

The ```Fonduer``
tutorials <https://github.com/hazyresearch/fonduer/tree/master/tutorials>`__
cover the ``Fonduer`` workflow, showing how to extract relations from
hardware datasheets and scientific literature.

The tutorials are available in the following directory:

::

tutorials/

For Developers
--------------

Testing
~~~~~~~

You can run unit tests locally by running

::

source ./set_env.sh
pytest tests -rsXx

FAQs
----

How do I connect to PostgreSQL? I'm getting "fe\_sendauth no password
supplied".

There are `four main
ways <https://dba.stackexchange.com/questions/14740/how-to-use-psql-with-no-password-prompt>`__
to deal with entering passwords when you connect to your PostgreSQL
database:

1. Set the ``PGPASSWORD`` environment variable
``PGPASSWORD=<pass> psql -h <host> -U <user>``
2. Using a `.pgpass file to store the
password <http://www.postgresql.org/docs/current/static/libpq-pgpass.html>`__.
3. Setting the users to `trust
authentication <https://www.postgresql.org/docs/current/static/auth-methods.html#AUTH-TRUST>`__
in the pg\_hba.conf file. This makes local development easy, but
probably isn't suitable for multiuser environments. You can find your
hba file location by running ``psql``, then querying
``SHOW hba_file;``
4. Put the username and password in the connection URI:
``postgres://user:pw@localhost:5432/...``

I'm getting a CalledProcessError for command 'pdftotext -f 1 -l 1
-bbox-layout'?

Are you using Ubuntu 14.04 (or older)? Fonduer requires
``poppler-utils`` to be `version ``0.36.0`` or
greater <https://poppler.freedesktop.org/releases.html>`__. Otherwise,
the ``-bbox-layout`` option is not available for ``pdftotext``.

If you must use Ubuntu 14.04, you can `install
manually <https://poppler.freedesktop.org>`__. As an example, to install
``0.53.0``:

.. code:: bash

sudo apt-get install build-essential checkinstall
wget poppler.freedesktop.org/poppler-0.53.0.tar.xz
tar -xf ./poppler-0.53.0.tar.xz
cd poppler-0.53.0
./configure
make
sudo checkinstall

We highly recommend using at least Ubuntu 16.04 though, as we haven't
done testing on 14.04 or older.

.. |GitHub license| image:: https://img.shields.io/github/license/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/blob/master/LICENSE
.. |GitHub stars| image:: https://img.shields.io/github/stars/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/stargazers
.. |PyPI| image:: https://img.shields.io/pypi/v/fonduer.svg
:target: https://pypi.org/project/fonduer/
.. |PyPI - Python Version| image:: https://img.shields.io/pypi/pyversions/fonduer.svg
:target: https://pypi.org/project/fonduer/
.. |GitHub issues| image:: https://img.shields.io/github/issues/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/issues
.. |Travis| image:: https://img.shields.io/travis/HazyResearch/fonduer.svg
:target: https://travis-ci.org/HazyResearch/fonduer
.. |Coveralls github| image:: https://img.shields.io/coveralls/github/HazyResearch/fonduer.svg
:target: https://coveralls.io/github/HazyResearch/fonduer


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fonduer-0.1.2.tar.gz (105.0 kB view hashes)

Uploaded Source

Built Distribution

fonduer-0.1.2-py3-none-any.whl (134.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page