Knowledge base construction system for richly formatted data.
Project description
Fonduer
=======
|GitHub license| |GitHub stars| |GitHub issues| |Travis|
``Fonduer`` is a framework for building knowledge base construction
(KBC) applications from *richy formatted data* and is implemented as a
library on top of a modified version of
`Snorkel <https://hazyresearch.github.io/snorkel/>`__.
*Note that Fonduer is still actively under development, so feedback and
contributions are welcome. Let us know in the
`Issues <https://github.com/HazyResearch/fonduer/issues>`__ section or
feel free to submit your contributions as a pull request.*
Reference
---------
*`Fonduer: Knowledge Base Construction from Richly Formatted
Data <https://arxiv.org/abs/1703.05028>`__*
::
@article{wu2017fonduer,
title={Fonduer: Knowledge Base Construction from Richly Formatted Data},
author={Wu, Sen and Hsiao, Luke and Cheng, Xiao and Hancock, Braden and Rekatsinas, Theodoros and Levis, Philip and R{\'e}, Christopher},
journal={arXiv preprint arXiv:1703.05028},
year={2017}
}
Installation
------------
Dependencies
~~~~~~~~~~~~
We use a few applications that you'll need to install and be sure are on
your PATH.
For OS X using `homebrew <https://brew.sh>`__:
.. code:: bash
brew install poppler
brew install postgresql
On Debian-based distros:
.. code:: bash
sudo apt-get install poppler-utils
sudo apt-get install postgresql
For the Python dependencies, we recommend using a
`virtualenv <https://virtualenv.pypa.io/en/stable/>`__. Once you have
cloned the repository, change directories to the root of the repository
and run
.. code:: bash
virtualenv -p python3 .venv
Once the virtual environment is created, activate it by running
.. code:: bash
source .venv/bin/activate
Any Python libraries installed will now be contained within this virtual
environment. To deactivate the environment, simply run ``deactivate``.
``Fonduer`` adds some additional python packages to the default Snorkel
installation which can be installed using ``pip``:
.. code:: bash
pip install -r python-package-requirement.txt
Running
-------
After installing Fonduer, and the additional python dependencies, just
run:
::
./run.sh
which will finish installing the external libraries we use.
Learning how to use ``Fonduer``
-------------------------------
The ```Fonduer``
tutorials <https://github.com/hazyresearch/fonduer/tree/master/tutorials>`__
cover the ``Fonduer`` workflow, showing how to extract relations from
hardware datasheets and scientific literature.
The tutorials are available in the following directory:
::
tutorials/
For Developers
--------------
Testing
~~~~~~~
You can run unit tests locally by running
::
source ./set_env.sh
pytest tests -rsXx
FAQs
----
How do I connect to PostgreSQL? I'm getting "fe\_sendauth no password
supplied".
There are `four main
ways <https://dba.stackexchange.com/questions/14740/how-to-use-psql-with-no-password-prompt>`__
to deal with entering passwords when you connect to your PostgreSQL
database:
1. Set the ``PGPASSWORD`` environment variable
``PGPASSWORD=<pass> psql -h <host> -U <user>``
2. Using a `.pgpass file to store the
password <http://www.postgresql.org/docs/current/static/libpq-pgpass.html>`__.
3. Setting the users to `trust
authentication <https://www.postgresql.org/docs/current/static/auth-methods.html#AUTH-TRUST>`__
in the pg\_hba.conf file. This makes local development easy, but
probably isn't suitable for multiuser environments. You can find your
hba file location by running ``psql``, then querying
``SHOW hba_file;``
4. Put the username and password in the connection URI:
``postgres://user:pw@localhost:5432/...``
I'm getting a CalledProcessError for command 'pdftotext -f 1 -l 1
-bbox-layout'?
Are you using Ubuntu 14.04 (or older)? Fonduer requires
``poppler-utils`` to be `version ``0.36.0`` or
greater <https://poppler.freedesktop.org/releases.html>`__. Otherwise,
the ``-bbox-layout`` option is not available for ``pdftotext``.
If you must use Ubuntu 14.04, you can `install
manually <https://poppler.freedesktop.org>`__. As an example, to install
``0.53.0``:
.. code:: bash
sudo apt-get install build-essential checkinstall
wget poppler.freedesktop.org/poppler-0.53.0.tar.xz
tar -xf ./poppler-0.53.0.tar.xz
cd poppler-0.53.0
./configure
make
sudo checkinstall
We highly recommend using at least Ubuntu 16.04 though, as we haven't
done testing on 14.04 or older.
.. |GitHub license| image:: https://img.shields.io/github/license/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/blob/master/LICENSE
.. |GitHub stars| image:: https://img.shields.io/github/stars/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/stargazers
.. |GitHub issues| image:: https://img.shields.io/github/issues/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/issues
.. |Travis| image:: https://img.shields.io/travis/HazyResearch/fonduer.svg
:target: https://travis-ci.org/HazyResearch/fonduer
=======
|GitHub license| |GitHub stars| |GitHub issues| |Travis|
``Fonduer`` is a framework for building knowledge base construction
(KBC) applications from *richy formatted data* and is implemented as a
library on top of a modified version of
`Snorkel <https://hazyresearch.github.io/snorkel/>`__.
*Note that Fonduer is still actively under development, so feedback and
contributions are welcome. Let us know in the
`Issues <https://github.com/HazyResearch/fonduer/issues>`__ section or
feel free to submit your contributions as a pull request.*
Reference
---------
*`Fonduer: Knowledge Base Construction from Richly Formatted
Data <https://arxiv.org/abs/1703.05028>`__*
::
@article{wu2017fonduer,
title={Fonduer: Knowledge Base Construction from Richly Formatted Data},
author={Wu, Sen and Hsiao, Luke and Cheng, Xiao and Hancock, Braden and Rekatsinas, Theodoros and Levis, Philip and R{\'e}, Christopher},
journal={arXiv preprint arXiv:1703.05028},
year={2017}
}
Installation
------------
Dependencies
~~~~~~~~~~~~
We use a few applications that you'll need to install and be sure are on
your PATH.
For OS X using `homebrew <https://brew.sh>`__:
.. code:: bash
brew install poppler
brew install postgresql
On Debian-based distros:
.. code:: bash
sudo apt-get install poppler-utils
sudo apt-get install postgresql
For the Python dependencies, we recommend using a
`virtualenv <https://virtualenv.pypa.io/en/stable/>`__. Once you have
cloned the repository, change directories to the root of the repository
and run
.. code:: bash
virtualenv -p python3 .venv
Once the virtual environment is created, activate it by running
.. code:: bash
source .venv/bin/activate
Any Python libraries installed will now be contained within this virtual
environment. To deactivate the environment, simply run ``deactivate``.
``Fonduer`` adds some additional python packages to the default Snorkel
installation which can be installed using ``pip``:
.. code:: bash
pip install -r python-package-requirement.txt
Running
-------
After installing Fonduer, and the additional python dependencies, just
run:
::
./run.sh
which will finish installing the external libraries we use.
Learning how to use ``Fonduer``
-------------------------------
The ```Fonduer``
tutorials <https://github.com/hazyresearch/fonduer/tree/master/tutorials>`__
cover the ``Fonduer`` workflow, showing how to extract relations from
hardware datasheets and scientific literature.
The tutorials are available in the following directory:
::
tutorials/
For Developers
--------------
Testing
~~~~~~~
You can run unit tests locally by running
::
source ./set_env.sh
pytest tests -rsXx
FAQs
----
How do I connect to PostgreSQL? I'm getting "fe\_sendauth no password
supplied".
There are `four main
ways <https://dba.stackexchange.com/questions/14740/how-to-use-psql-with-no-password-prompt>`__
to deal with entering passwords when you connect to your PostgreSQL
database:
1. Set the ``PGPASSWORD`` environment variable
``PGPASSWORD=<pass> psql -h <host> -U <user>``
2. Using a `.pgpass file to store the
password <http://www.postgresql.org/docs/current/static/libpq-pgpass.html>`__.
3. Setting the users to `trust
authentication <https://www.postgresql.org/docs/current/static/auth-methods.html#AUTH-TRUST>`__
in the pg\_hba.conf file. This makes local development easy, but
probably isn't suitable for multiuser environments. You can find your
hba file location by running ``psql``, then querying
``SHOW hba_file;``
4. Put the username and password in the connection URI:
``postgres://user:pw@localhost:5432/...``
I'm getting a CalledProcessError for command 'pdftotext -f 1 -l 1
-bbox-layout'?
Are you using Ubuntu 14.04 (or older)? Fonduer requires
``poppler-utils`` to be `version ``0.36.0`` or
greater <https://poppler.freedesktop.org/releases.html>`__. Otherwise,
the ``-bbox-layout`` option is not available for ``pdftotext``.
If you must use Ubuntu 14.04, you can `install
manually <https://poppler.freedesktop.org>`__. As an example, to install
``0.53.0``:
.. code:: bash
sudo apt-get install build-essential checkinstall
wget poppler.freedesktop.org/poppler-0.53.0.tar.xz
tar -xf ./poppler-0.53.0.tar.xz
cd poppler-0.53.0
./configure
make
sudo checkinstall
We highly recommend using at least Ubuntu 16.04 though, as we haven't
done testing on 14.04 or older.
.. |GitHub license| image:: https://img.shields.io/github/license/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/blob/master/LICENSE
.. |GitHub stars| image:: https://img.shields.io/github/stars/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/stargazers
.. |GitHub issues| image:: https://img.shields.io/github/issues/HazyResearch/fonduer.svg
:target: https://github.com/HazyResearch/fonduer/issues
.. |Travis| image:: https://img.shields.io/travis/HazyResearch/fonduer.svg
:target: https://travis-ci.org/HazyResearch/fonduer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fonduer-0.1.1.tar.gz
(105.8 kB
view hashes)
Built Distribution
fonduer-0.1.1-py3-none-any.whl
(134.0 kB
view hashes)