Transform unstructured document collections to structured Linked Data
Project description
Ferenda is a python library and framework for transforming unstructured document collections into structured Linked Data. It helps with downloading documents, parsing them to add explicit semantic structure and RDF-based metadata, finding relationships between documents, and publishing the results.
Quick start
This example uses ferenda’s project framework to download the 50 latest RFCs and W3C standards, parse documents into structured, RDF-enabled XHTML documents, loads all RDF metadata into a triplestore and generates a web site of static HTML5 files that are usable offline:
pip install ferenda ferenda-setup myproject cd myproject ./ferenda-build.py ferenda.sources.tech.RFC enable ./ferenda-build.py ferenda.sources.tech.W3Standards enable ./ferenda-build.py all all --downloadmax=50 --staticsite --fulltextindex=False open data/index.html
The same functionality can also be accessed through a python API, if you want to use ferenda as part of a larger system. It’s also possible to just use the parts of ferenda that you need (eg. only the downloading and parsing features).
More information
See http://ferenda.readthedocs.org/ for in-depth documentation.
Copyright and license
Most of the code written by Staffan Malmgren, licensed under the main 2-clause BSD license.
Some bundled code and other creative works are written by other authors, included in accordance with their respective licenses:
cssmin by Zachary Voase, BSD
rdflib-sqlite by Graham Higgins, BSD
patch by Anatoly Techtonik, MIT
Grit XSLT stylesheets by Niklas Lindström, BSD
httpheader by Deron Meranda, LGPL
normalize.css, MIT
jquery , MIT
modernizr, MIT
respond.js, MIT/GPL
Gentleface wireframe toolbar icons, CC-BY-NC
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ferenda-0.1.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ad42a443f13db432726fb1e7836157aab3981436b3a96606108fb103a2b9cb6 |
|
MD5 | 4615f955be72f467f2583d7d9d98596f |
|
BLAKE2b-256 | c2add7ba4edf73c58ebd0d9c060016ad995b171dea7d9a016af87188e7313b65 |