Skip to main content

Command line tools, Golang package and Python module for working with the EPrints 3.x REST API

Project description

eprinttools

eprinttools is a collection of command line tools written in Go, a Go package and set of command line utilities for working with EPrints 3.x EPrint XML and REST API written in Python 3. Eventually it is planned to have this project become a pure Python project.

This project also hosts demonstration code to replicate a public facing version of an EPrints repository outside of EPrints. Think of it as the public views and landing pages.

Go base code

The command line programs

  • eputil is a command line utility for interacting (e.g. harvesting) JSON and XML from EPrints' REST API
    • uses minimal configuration because it does less!
    • it superceded the ep command
  • epfmt is a command line utility to pretty print EPrints XML and convert to/from JSON
    • in the process of pretty printing it also validates the EPrints XML against the eprinttools Go package definitions
  • doi2eprintxml is a command line program for turning metadata harvested from CrossRef and DataCite into an EPrint XML document based on one or more supplied DOI
  • eprintxml2json is a command line program for taking EPrint XML and turning it into JSON

The first two utilities can be configured from the environment or command line options. The environment settings are overridden by command line options. For details running either command envoke the tool name with the '-help' option.

Python base code

Python Modules

eprints3x

This python module wraps the eputil Go comand in Python. It makes it trivial to implement harvesting an EPrints repository using the EPrints REST API.

eprintviews

This python module uses py_dataset and the harvested content to generate a htdocs directory similar to the URL layout of EPrints. It features classes for working with Views, Users (needed to attribute names to userid fields EPrint XML harvested from the REST API), Subjects (a way to load the subjects text file from an EPrints archive and generate the path to label mapping used when rendering views into an htdocs directory) and Aggregator (this does the heavy lifting of processing a dataset collection of harvested EPrint XML and generating the views as JSON documents in the htdocs directory).

command line tools

harvester_full.py, harvester_recent.py

These two Python programs use eprints3x module to implement harvesters of EPrint XML and any related digitl objects (e.g. PDFs, images) into a dataset collection

genviews.py

This Python program processes a dataset collection and renders an htdocs tree populating it with JSON documents and key lists. This skeleton of metadata and directory structure can then be processed into a rendered website mirroring the content from an EPrints repository. This module relies on eprintviews.

indexer.py

This Python program indexes the contents of our replicated EPrints site by creating scheme.json files along side the index.json files that represent the landing pages for the replicated repository. These can then be easily ingested into Lunr.js or Elasticsearch. Currently the proof of concept targets Lunr.js. This module relies on eprintviews.

mk_website.py

This Python program creates the HTML pages from Markdown documents in the static folder (e.g. home page and major landing pages) as well as the individual views and abstracts from the JSON documents created by genviews.py. The final result is a static website ready to serve out to the public. This module relies on eprintviews.

publisher.py

This Python program copys (syncs) the content with an AWS S3 bucket via the AWS command line tools.

Related GitHub projects

py_dataset

This Python module provides access to dataset collections which we use as intermediate storage for JSON documents and related attachments.

AMES

The eprintools command line programs have been made available to Python via the AMES project. This include support for both read and write to EPrints repository systems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eprinttools-0.1.7.tar.gz (19.4 kB view details)

Uploaded Source

Built Distributions

eprinttools-0.1.7-py3.7.egg (41.6 kB view details)

Uploaded Source

eprinttools-0.1.7-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file eprinttools-0.1.7.tar.gz.

File metadata

  • Download URL: eprinttools-0.1.7.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for eprinttools-0.1.7.tar.gz
Algorithm Hash digest
SHA256 54ab939d23e26eca1c653c5afb564141643362e0ae38f0e9c55d7503b21da744
MD5 1dca539da8328b03123b983af3888a60
BLAKE2b-256 8d12d029f31074d2d7c2e7cdab823b2cae889ab0fe451cb8148746351e54d622

See more details on using hashes here.

File details

Details for the file eprinttools-0.1.7-py3.7.egg.

File metadata

  • Download URL: eprinttools-0.1.7-py3.7.egg
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for eprinttools-0.1.7-py3.7.egg
Algorithm Hash digest
SHA256 044ee9fc7fbaeaf098b3edcc406861ad52aa011ab8900eca516f18b5b4a1fce1
MD5 1eb22be6ac68e2b9993b939e5a39ce4a
BLAKE2b-256 6d865c8a3f14177dc851f393ac3ca173ac1ff96fad55e17b32982474a0ba367f

See more details on using hashes here.

File details

Details for the file eprinttools-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: eprinttools-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for eprinttools-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 cca83810267e6307589f4df0ec2e2c552583aabb35b4280872eae8971fffbff0
MD5 77d71f27da9e914cfe9d2859684ae34f
BLAKE2b-256 b76c745cd331bf284bc18da3c678931b07a27083b63b50552bf596d4d1c768ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page