Skip to main content

Transform unstructured document collections to structured Linked Data

Project description

Ferenda is a python library and framework for transforming unstructured document collections into structured Linked Data. It helps with downloading documents, parsing them to add explicit semantic structure and RDF-based metadata, finding relationships between documents, and publishing the results.

https://travis-ci.org/staffanm/ferenda.png?branch=master https://coveralls.io/repos/staffanm/ferenda/badge.png?branch=master

Quick start

This example uses ferenda’s project framework to download the 50 latest RFCs and W3C standards, parse documents into structured, RDF-enabled XHTML documents, loads all RDF metadata into a triplestore and generates a web site of static HTML5 files that are usable offline:

pip install ferenda
ferenda-setup myproject
cd myproject
./ferenda-build.py ferenda.sources.tech.RFC enable
./ferenda-build.py ferenda.sources.tech.W3Standards enable
./ferenda-build.py all all --downloadmax=50 --staticsite --fulltextindex=False
open data/index.html

The same functionality can also be accessed through a python API, if you want to use ferenda as part of a larger system. It’s also possible to just use the parts of ferenda that you need (eg. only the downloading and parsing features).

More information

See http://ferenda.readthedocs.org/ for in-depth documentation.

Release history Release notifications

History Node

0.3.0

History Node

0.2.0

This version
History Node

0.1.7

History Node

0.1.6.1

History Node

0.1.6

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ferenda-0.1.7-py2.py3-none-any.whl (575.1 kB) Copy SHA256 hash SHA256 Wheel 2.7 Apr 22, 2014
ferenda-0.1.7.tar.gz (571.2 kB) Copy SHA256 hash SHA256 Source None Apr 22, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page