Skip to main content

Transform unstructured document collections to structured Linked Data

Project description

Ferenda is a python library and framework for transforming unstructured document collections into structured Linked Data. It helps with downloading documents, parsing them to add explicit semantic structure and RDF-based metadata, finding relationships between documents, and publishing the results, including through a REST-based HTTP API.

https://badge.fury.io/py/ferenda.png https://travis-ci.org/staffanm/ferenda.png?branch=master https://ci.appveyor.com/api/projects/status/aqdo3c39cdof8opa/branch/master https://coveralls.io/repos/staffanm/ferenda/badge.png?branch=master Code Health https://pypip.in/d/ferenda/badge.png

Quick start

This example uses ferenda’s project framework to download the 50 latest RFCs and W3C standards, parse documents into structured, RDF-enabled XHTML documents, loads all RDF metadata into a triplestore and generates a web site of static HTML5 files that are usable offline:

pip install ferenda
ferenda-setup myproject
cd myproject
./ferenda-build.py ferenda.sources.tech.RFC enable
./ferenda-build.py ferenda.sources.tech.W3Standards enable
./ferenda-build.py all all --downloadmax=50 --staticsite --fulltextindex=False
open data/index.html

The same functionality can also be accessed through a python API, if you want to use ferenda as part of a larger system. It’s also possible to just use the parts of ferenda that you need (eg. only the downloading and parsing features).

More information

See http://ferenda.readthedocs.org/ for in-depth documentation.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ferenda-0.3.0.tar.gz (835.8 kB view hashes)

Uploaded source

Built Distribution

ferenda-0.3.0-py2.py3-none-any.whl (842.6 kB view hashes)

Uploaded py2 py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page