A software to validate CSV documents storing citation data and bibliographic metadata according to the OpenCitations Data Model.
Project description
oc_validator
oc_validator is a Python (≥3.9) library to validate CSV documents storing citation data and bibliographic metadata. To be processed by the validator, the tables must be built as either CITS-CSV or META-CSV tables, defined in two specification documents[^1][^2].
[^1]: Massari, Arcangelo, and Ivan Heibi. 2022. ‘How to Structure Citations Data and Bibliographic Metadata in the OpenCitations Accepted Format’. https://doi.org/10.48550/arXiv.2206.03971.
[^2]: Massari, Arcangelo. 2022. ‘How to Produce Well-Formed CSV Files for OpenCitations’. https://doi.org/10.5281/zenodo.6597141.
Installation
The library can be installed from pip:
pip install oc_validator
Usage
The validation process can be executed from the CLI by running the following command:
python -m oc_validator.main -i <input csv file path> -o <output dir path> [-m] [-s]
Required Parameters
-i
,--input
: The path to the CSV file to validate.-o
,--output
: The path to the directory where the output JSON file and .txt file will be stored.
Optional Parameters
-m
,--use-meta
: Enables the use of the OC Meta endpoint instead of external APIs to check if an ID exists (by checking if it is registered in OpenCitations Meta). If included, this option allows to fasten the whole process, since querying Meta is faster than querying external APIs, but results might not be the most up to date.-s
,--no-id-existence
: Skips the check for ID existence altogether, ensuring that neither the Meta endpoint nor any external APIs are used during validation. This allows for a much shorter execution time, but does not make sure that all the submitted IDs actually refer to real-world entities.
Example Usage from CLI
To validate a CSV file and output the results to a specified directory (with optional parameters set to default values, i.e. checking for the existence of IDs via querying external APIs):
python -m oc_validator.main -i path/to/input.csv -o path/to/output_dir
To use OC Meta endpoint instead of external APIs to verify the existence of the IDs:
python -m oc_validator.main -i path/to/input.csv -o path/to/output_dir -m
To skip all ID existence verification:
python -m oc_validator.main -i path/to/input.csv -o path/to/output_dir -s
Programmatic Usage
An object of the Validator
class is instantiated, passing as parameters the path to the input document to validate and the path to the directory where to store the output. By calling the validate()
method on the instance of Validator
, the validation process gets executed.
The process automatically detects which of the two tables has been passed as input (on condition that the input CSV document's header is formatted correctly for at least one of them). During the process, the whole document is always processed: if the document is invalid or contains anomalies, the errors/warnings are reported in detail in a JSON file and summarized in a .txt file, which will be automatically created in the output directory. validate
also returns a list of dictionaries corresponding to the JSON validation report (empty if the document is valid).
from oc_validator.main import Validator
# Basic validation
v = Validator('path/to/table.csv', 'output/directory')
v.validate()
# Validation with Meta endpoint checking for ID existence
v = Validator('path/to/table.csv', 'output/directory', use_meta_endpoint=True)
v.validate()
# Validation skipping all ID existence checks
v = Validator('path/to/table.csv', 'output/directory', verify_id_existence=False)
v.validate()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oc_validator-0.3.2.tar.gz
.
File metadata
- Download URL: oc_validator-0.3.2.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc2012e6c1d35ebb0bceed588310eb76ac55a9d563d337405b4eaff86a52ead4 |
|
MD5 | 747762777b52d9c2b1aeaaa81f9981e7 |
|
BLAKE2b-256 | 18077e7c8c535371c453456881a4a9f8c27b178e0b69f1cec46fe231d550ec44 |
File details
Details for the file oc_validator-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: oc_validator-0.3.2-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f6a7b02093b9840f9c947998f79de2b39ea5a5b0de79cb0239c4fea4c1255bf |
|
MD5 | 39fdfd7766fad065c013503eacfbb4e3 |
|
BLAKE2b-256 | a63930068ae750874504292dfb373a6283d58fb30c5ef1e67c0cbbcea68ee063 |