Converts instances from a CSV file into RDF.
Project description
Description
The entityrdfizer
project is designed to convert entities of any domain and their data and metadata into RDF.
It requires the entities and their data to be provided as inputs in an ABox CSV template, that is
filled in with data. A group of ABox CSV template files are provided under the following URL:
https://github.com/cambridge-cares/TheWorldAvatar/tree/master/JPS_Ontology/KBTemplates/ABox
Installation
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Virtual environment setup
It is highly recommended to use a virtual environment for the entityrdfizer
installation. The virtual environment can be created as follows:
(Windows)
$ python -m venv entityrdfizer_venv
$ entityrdfizer_venv\Scripts\activate.bat
(entityrdfizer_venv) $
(Linux)
$ python3 -m venv entityrdfizer_venv
$ source entityrdfizer_venv\bin\activate
(entityrdfizer_venv) $
The above commands will create and activate the virtual environment entityrdfizer_venv
in the current directory.
Installation via pip
To install the entityrdfizer
simply run the following command:
(entityrdfizer_venv) $ pip install entityrdfizer
Installation from the version-controlled source (for developers)
This type of installation is only for the developers. To install entityrdfizer
directly from its repository you need to first clone the TheWorldAvatar
project. Then simply navigate to the TheWorldAvatar\EntityRDFizer directory and execute the following commands:
# build and install
(entityrdfizer_venv) $ pip install .
# or build for in-place development
(entityrdfizer_venv) $ pip install -e .
Alternatively, use the provided install_rdfizer.sh
convenience script, that can create virtual environment and install the entityrdfizer
in one go:
# create the environment and install the project
$ install_rdfizer.sh -v -i
# create the environment and install the project for in-place development
$ install_rdfizer.sh -v -i -e
Note that installing the project for in-place development (setting the -e
flag) also installs the required python packages for development and testing. To test the code, simply run the following commands:
(entityrdfizer_venv) $ pytest
# or
(entityrdfizer_venv) $ pytest tests
How to use
Usage:
csv2rdf <csvFileOrDirPath> --csvType=<type> [--outDir=<dir>] [--csvTbox=<tbox>]
Options:
--csvType=<type> Type of the csv file.
Choose one of abox/tbox [default: abox]
--outDir=<dir> Output directory path
--csvTbox=<tbox> TBox in csv format to validate the input ABox csv file (for ABox writer only)
csv file format for ABox
The input csv file must have at least 6 columns: A,B,C,D,E,F. Extra columns are ignored.
The file specified for parameter --csvTbox
should follow the format in examples
EntityRDFizer/tests/test_tboxes/ontocompchem/
Rows in csv file contain one of the following:
Ontology description containing prefixes for the TBox and the ABox.
For ABox prefix:
- Col A: ABox file name (actually not used, but col A cannot be empty)
- Col B: "Ontology"
- Col C: http://www.theworldavatar.com/kb/ontospecies for ABox (To be changed accordingly)
- Col D: "base"
- Col E,F are not used.
Fot TBox prefix:
- Col A: not used (Col A cannot be empty)
- Col B: "Ontology"
- Col C: http://www.theworldavatar.com/ontology/ontospecies/OntoSpecies.owl (To be changed accordingly)
- Col D: "http://www.w3.org/2002/07/owl#imports"
- Col E,F are not used.
The ontology prefix in Col C mush end with SLASH (/) or HASH (#). The full path of entities will be http://www.theworldavatar.com/ontology/ontospecies/ClassName or http://www.theworldavatar.com/ontology/ontospecies/OntoSpecies.owl#ClassName, respectively.
Definition of an instance of class
The name of the instance can be either a full path or relative to the base ontology.
- Col A: short class name for the ontology defined in the TBox, or a full IRI of class name for a class from an external ontologies
- Col B: "Instance"
- Col C: The new instance name. It is possible to provide a full IRI of the instance together with the ontology defined in base,
- Col D,E,F must be empty.
Relation between two class instances
- Col A: Subject. An instance name defined earlier in this file, or a full IRI of the instance
- Col B: "Instance"
- Col C: Object. The instance defined before this point or a full IRI of the instance
- Col D: Predicate. Relative name or rull IRI of the triple: Col A predicate Col C.
- Col E,F are not used. If the instance of classes A,C are relatile paths then they must be defined before this line.
Assign data value to an instance
Data type of the instance can be full path, or one of predefined shortcuts: 'string', 'integer', 'float', 'double', 'decimal', 'datetime', 'boolean'. For the predefined data types it is possible to add the "xsd:" prefix, like 'xsd:string', etc.
- Col A: Full http:// address of the relation
- Col B: "Data Property"
- Col C: instance to assign the value
- Col D is not used
- Col E: value to be assigned
- Col F: data type of the value.
Authors
Feroz Farazi (msff2@cam.ac.uk), 17 May 2021
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file entityrdfizer-1.0.7.tar.gz
.
File metadata
- Download URL: entityrdfizer-1.0.7.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d051abe66b8b2283ca5ab5f0a1612da91f11018376de45eab9d70de819791ce |
|
MD5 | da636ae8233436c41b8e5558a9fdc7c4 |
|
BLAKE2b-256 | 65c39a0c05f110d149db3317e9c93f6a981045ba26903e2582e0ad3cd1a9bc2e |
File details
Details for the file entityrdfizer-1.0.7-py3-none-any.whl
.
File metadata
- Download URL: entityrdfizer-1.0.7-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08b04cda12c0aeecb6c2706cbabbdda290864659181ec8efdfd8a44d2e1da896 |
|
MD5 | 184480588a4bdc0afea4bb4a27e5a01c |
|
BLAKE2b-256 | 3d4eb53d0d5b2e81ecaa74617078fb4aa80438f9fcf3bc0c157a6259f8b2a68c |