Skip to main content

A flexible and lightweight Python interface to the re3data.org database

Project description

py3data

PyPI DOI

py3data is a Python library for re3data registry. Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions. Re3data offers an open and free REST API. py3data is a lightweight and thin Python interface to the beta version of this API.

The following features of re3data are currently supported by py3data:

  • Get single repositories
  • Filter and query repositories

Key features

  • Pipe operations - py3data can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see code snippets.
  • JSON support - Re3data doesn't offer a JSON implementation of the REST API. py3data parses the XML REST API and offers it in Python dict-like objects.
  • Schema fixes - The re3data Schema is slightly hard to parse in Python directly. Re3data makes is very easy to parse the API and solves the issues.
  • Permissive license - Re3data data is CC0 licensed :raised_hands:. py3data is published under the MIT license.

Installation

py3data requires Python 3.8 or later.

pip install py3data

Getting started

from py3data import Repositories

Get single repository

Get a single Repository

Repositories()["r3d100011986"]

The result is a Repository object, which is very similar to a dictionary. Find the available fields with .keys().

For example, get the open access status:

Repositories()["r3d100011986"]["subjects"]
[{'subjectScheme': 'DFG', 'subjectName': '2 Life Sciences'},
 {'subjectScheme': 'DFG', 'subjectName': '202 Plant Sciences'},
 {'subjectScheme': 'DFG',
  'subjectName': '20202 Plant Ecology and Ecosystem Analysis'},
 {'subjectScheme': 'DFG',
  'subjectName': '20203 Inter-organismic Interactions of Plants'},
 {'subjectScheme': 'DFG', 'subjectName': '203 Zoology'},
 {'subjectScheme': 'DFG',
  'subjectName': '20303 Animal Ecology, Biodiversity and Ecosystem Research'},
 {'subjectScheme': 'DFG', 'subjectName': '21 Biology'},
 {'subjectScheme': 'DFG', 'subjectName': '3 Natural Sciences'},
 {'subjectScheme': 'DFG',
  'subjectName': '313 Atmospheric Science and Oceanography'},
 {'subjectScheme': 'DFG', 'subjectName': '318 Water Research'},
 {'subjectScheme': 'DFG',
  'subjectName': '31801 Hydrogeology, Hydrology, Limnology, Urban Water Management, Water Chemistry, Integrated Water Resources Management'},
 {'subjectScheme': 'DFG',
  'subjectName': '34 Geosciences (including Geography)'}]

Get lists of repositories

It is possible to get lists of results from re3data. However keep in mind that lists consist of Repository objects with very few metadata (id, name, doi, link).

Get all repositories:

Repositories().get()

For lists of repositories, you can also count the number of records found instead of returning the results. This also works for search queries and filters.

Repositories().count()
# 3137

Filter and query records

Re3data makes use of filters and queries. Filters can be used to slice the structured metadata of re3data and queries can be used to search for specific terms or phrases. Both filters and queries can be used in one request.

An overview of all the filters can be found under "Beta" in the REST API documentation. It can be hard to find the correct values sometimes. In that case, look for values in other single Repository requests, the Metadata Schema, or the website.

(
  Repositories()
    .filter(countries="CAN")
    .filter(subjects=["2 Life Sciences", "3 Natural Sciences"])
    .filter(pidSystems="DOI")
    .query("University")
    .get()
)

which is identical to:

(
  Repositories()
    .filter(
      countries="CAN",
      subjects=["2 Life Sciences", "3 Natural Sciences"],
      pidSystems="DOI",
    )
    .query("University")
    .get()
)

Code snippets

A list of examples for the re3data.org dataset.

Get repositories running Dataverse software

(
  Repositories()
    .filter(software="Dataverse")
    .get()
)

Get repositories with word "climate" and DOI identifiers

(
  Repositories()
    .filter(pidSystems="DOI")
    .query("climate")
    .get()
)

License

MIT

Contact

This library is a community contribution. The authors of this Python library aren't affliated to re3data.

Feel free to reach out with questions, remarks, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py3data-0.3.2.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

py3data-0.3.2-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file py3data-0.3.2.tar.gz.

File metadata

  • Download URL: py3data-0.3.2.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for py3data-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9dd0cb1f17c6ff85c37d2e3a30ed02fad5a67049623c831d7b8773951ebf7967
MD5 11beabd161c9a0556efcb92a919735b2
BLAKE2b-256 b5f1d1c0694573048c4150f505c379f076103cb93e80e8fecea1c030abef6fdd

See more details on using hashes here.

File details

Details for the file py3data-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: py3data-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for py3data-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 47f3038f9078f7ae06375f17b0bef05042af0509d10e64d46ef1b5435f366d98
MD5 7176782ceef4a96ac1c636b4ad467a0d
BLAKE2b-256 c8cce4deae82bbe76ea7133cd6bbeac12c294bed8f96e508c5c5522f77464482

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page