Skip to main content

functionality to retrieve CLDF datasets deposited on Zenodo

Project description

cldfzenodo

Build Status PyPI

cldfzenodo provides programmatic access to CLDF data deposited on Zenodo.

Install

pip install cldfzenodo

CLI

cldfzenodo provides a subcommand to be run from cldfbench. To make use of this command, you have to install cldfbench, which can be done via

pip install cldfzenodo[cli]

Then you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.

cldfbench zenodo.download 10.5281/zenodo.4683137  --directory wals-2020.1/

will download WALS Online as CLDF dataset into wals-2020.1:

$ tree wals-2020.1/
wals-2020.1/
├── areas.csv
├── chapters.csv
├── codes.csv
├── contributors.csv
├── countries.csv
├── examples.csv
├── language_names.csv
├── languages.csv
├── parameters.csv
├── sources.bib
├── StructureDataset-metadata.json
└── values.csv

0 directories, 12 files

API

Metadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via cldfzenodo.Record objects. Such objects can be obtained in various ways:

  • Via DOI:
    import cldfzenodo
    rec = cldfzenodo.Record.from_doi('https://doi.org/10.5281/zenodo.4762034')
    
  • From deposits grouped into a Zenodo community (and obtained through OAI-PMH):
    import cldfzenodo.oai
    for rec in cldfzenodo.oai.iter_records('dictionaria'):
      print(rec)
    
  • From search results using keywords:
    import cldfzenodo
    for rec in cldfzenodo.search_wordlists():
      print(rec)
    

cldfzenodo.Record objects provide sufficient metadata to allow identification and data access:

>>> from cldfzenodo import Record
>>> print(Record.from_doi('10.5281/zenodo.4762034').bibtex)
@misc{zenodo-4762034,
  author    = {Hammarström, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},
  title     = {glottolog/glottolog: Glottolog database 4.4 as CLDF},
  keywords  = {cldf:StructureDataset, linguistics},
  publisher = {Zenodo},
  year      = {2021},
  doi       = {10.5281/zenodo.4762034},
  url       = {https://doi.org/10.5281/zenodo.4762034},
  copyright = {Creative Commons Attribution 4.0}
}

One can download the full deposit (and access - possible multiple - CLDF datasets):

from pycldf import iter_datasets
record.download('my_directory')
for cldf in iter_datasets('my_directory'):
    pass

But often, only the "pure" CLDF data is of interest - and not the additional metadata and curation context, e.g. of cldfbench-curated datasets. This can be done via

from pycldf import Dataset
mdpath = record.download_dataset('my_directory')
cldf = Dataset.from_metadata(mdpath)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cldfzenodo-0.2.0.tar.gz (13.5 kB view hashes)

Uploaded Source

Built Distribution

cldfzenodo-0.2.0-py2.py3-none-any.whl (13.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page