Skip to main content

Functionality to retrieve CLDF datasets deposited on Zenodo

Project description

cldfzenodo

Build Status PyPI

cldfzenodo provides programmatic access to CLDF data deposited on Zenodo.

NOTE: The Zenodo upgrade from October 13, 2023 introduced quite a few changes in various parts of the system. Thus, cldfzenodo before version 2.0 cannot be used anymore. cldfzenodo is meant to be backwards compatible, i.e. provides the same Python API as cldfzenodo 1.x - but may issue deprecation warnings.

Install

pip install cldfzenodo

pycldf dataset resolver

cldfzenodo registers (upon installation) a pycldf dataset resolver for dataset locators of the form https://doi.org/10.5281/zenodo.[0-9]+ and https://zenodo.org/record/[0-9]+. Thus, after installation you should be able to retrieve pycldf.Dataset instances running

>>> from pycldf.ext.discovery import get_dataset
>>> import pathlib
>>> pathlib.Path('wacl').mkdir()
>>> ds = get_dataset('https://doi.org/10.5281/zenodo.7322688', pathlib.Path('wacl'))
>>> ds.properties['dc:title']
'World Atlas of Classifier Languages'

CLI

cldfzenodo provides a subcommand to be run from cldfbench. To make use of this command, you have to install cldfbench, which can be done via

pip install cldfzenodo[cli]

Then you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.

cldfbench zenodo.download 10.5281/zenodo.4683137  --directory wals-2020.1/

will download WALS Online as CLDF dataset into wals-2020.1:

$ tree wals-2020.1/
wals-2020.1/
├── areas.csv
├── chapters.csv
├── codes.csv
├── contributors.csv
├── countries.csv
├── examples.csv
├── language_names.csv
├── languages.csv
├── parameters.csv
├── sources.bib
├── StructureDataset-metadata.json
└── values.csv

0 directories, 12 files

API

Metadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via cldfzenodo.Record objects. Such objects can be obtained in various ways:

  • Via DOI:
    >>> from cldfzenodo import API
    >>> rec = API.get_record(doi='10.5281/zenodo.4762034')
    >>> rec.title
    'glottolog/glottolog: Glottolog database 4.4 as CLDF'
    
  • Via concept DOI and version tag:
    >>> from cldfzenodo import API
    >>> rec = API.get_record(conceptdoi='10.5281/zenodo.3260727', version='4.5')
    >>> rec.title
    'glottolog/glottolog: Glottolog database 4.5 as CLDF'
    
  • From deposits grouped into a Zenodo community:
    >>> from cldfzenodo import API
    >>> for rec in API.iter_records(community='dictionaria'):
    ...     print(rec.title)
    ...     break
    ...     
    dictionaria/iquito: Iquito dictionary
    
  • From search results using keywords:
    >>> from cldfzenodo import API
    >>> for rec in API.iter_records(keyword='cldf:Wordlist'):
    ...     print(rec.title)
    ...     break
    ...     
    CLDF dataset accompanying Zariquiey et al.'s "Evolution of Body-Part Terminology in Pano" from 2022
    

cldfzenodo.Record objects provide sufficient metadata to allow identification and data access:

>>> from cldfzenodo import API
>>> print(API.get_record(doi='10.5281/zenodo.4762034').bibtex)
@misc{zenodo-4762034,
  author    = {Hammarström, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},
  title     = {glottolog/glottolog: Glottolog database 4.4 as CLDF},
  keywords  = {cldf:StructureDataset, linguistics},
  publisher = {Zenodo},
  year      = {2021},
  doi       = {10.5281/zenodo.4762034},
  url       = {https://doi.org/10.5281/zenodo.4762034},
  copyright = {Creative Commons Attribution 4.0}
}

One can download the full deposit (and access - possible multiple - CLDF datasets):

from pycldf import iter_datasets

API.get_record(doi='...').download('my_directory')
for cldf in iter_datasets('my_directory'):
    pass

But often, only the "pure" CLDF data is of interest - and not the additional metadata and curation context, e.g. of cldfbench-curated datasets. This can be done via

cldf = API.get_record(doi='...').download_dataset('my_directory')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cldfzenodo-2.2.2.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cldfzenodo-2.2.2-py2.py3-none-any.whl (17.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file cldfzenodo-2.2.2.tar.gz.

File metadata

  • Download URL: cldfzenodo-2.2.2.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cldfzenodo-2.2.2.tar.gz
Algorithm Hash digest
SHA256 1e10ca88a4c986c4fe0b8b43d387fd0103a1810e237f51f7a72613c5939a2b18
MD5 6ae65e224a27ffb7828bdb0a9bb54ed6
BLAKE2b-256 c3276f38697e0da56fb005acfbf50d8945cb099f76727383b81a2d0b95b506db

See more details on using hashes here.

File details

Details for the file cldfzenodo-2.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: cldfzenodo-2.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cldfzenodo-2.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 95bff46e4768cf7de4dab1339390a42ab8aaaecccbac1ceb1b43e10d7240d022
MD5 62214f67ba39c19d6b8836f23a32d63d
BLAKE2b-256 c84e71e1cc8626ad2977602de3e945666b0d4ec9514ad57517cc9b4d6bce8f60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page