Functionality to retrieve CLDF datasets deposited on Zenodo
Project description
cldfzenodo
cldfzenodo provides programmatic access to CLDF data deposited on Zenodo.
NOTE: The Zenodo upgrade from October 13, 2023 introduced quite a few changes in various parts
of the system. Thus, cldfzenodo before version 2.0 cannot be used anymore. cldfzenodo is meant
to be backwards compatible, i.e. provides the same Python API as cldfzenodo 1.x - but may issue
deprecation warnings.
Install
pip install cldfzenodo
pycldf dataset resolver
cldfzenodo registers (upon installation) a pycldf dataset resolver
for dataset locators of the form https://doi.org/10.5281/zenodo.[0-9]+ and https://zenodo.org/record/[0-9]+.
Thus, after installation you should be able to retrieve pycldf.Dataset instances running
>>> from pycldf.ext.discovery import get_dataset
>>> import pathlib
>>> pathlib.Path('wacl').mkdir()
>>> ds = get_dataset('https://doi.org/10.5281/zenodo.7322688', pathlib.Path('wacl'))
>>> ds.properties['dc:title']
'World Atlas of Classifier Languages'
CLI
cldfzenodo provides a subcommand to be run from cldfbench.
To make use of this command, you have to install cldfbench, which can be done via
pip install cldfzenodo[cli]
Then you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.
cldfbench zenodo.download 10.5281/zenodo.4683137 --directory wals-2020.1/
will download WALS Online as CLDF dataset into wals-2020.1:
$ tree wals-2020.1/
wals-2020.1/
├── areas.csv
├── chapters.csv
├── codes.csv
├── contributors.csv
├── countries.csv
├── examples.csv
├── language_names.csv
├── languages.csv
├── parameters.csv
├── sources.bib
├── StructureDataset-metadata.json
└── values.csv
0 directories, 12 files
API
Metadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via cldfzenodo.Record
objects. Such objects can be obtained in various ways:
- Via DOI:
>>> from cldfzenodo import API >>> rec = API.get_record(doi='10.5281/zenodo.4762034') >>> rec.title 'glottolog/glottolog: Glottolog database 4.4 as CLDF'
- Via concept DOI and version tag:
>>> from cldfzenodo import API >>> rec = API.get_record(conceptdoi='10.5281/zenodo.3260727', version='4.5') >>> rec.title 'glottolog/glottolog: Glottolog database 4.5 as CLDF'
- From deposits grouped into a Zenodo community:
>>> from cldfzenodo import API >>> for rec in API.iter_records(community='dictionaria'): ... print(rec.title) ... break ... dictionaria/iquito: Iquito dictionary
- From search results using keywords:
>>> from cldfzenodo import API >>> for rec in API.iter_records(keyword='cldf:Wordlist'): ... print(rec.title) ... break ... CLDF dataset accompanying Zariquiey et al.'s "Evolution of Body-Part Terminology in Pano" from 2022
cldfzenodo.Record objects provide sufficient metadata to allow identification and data access:
>>> from cldfzenodo import API
>>> print(API.get_record(doi='10.5281/zenodo.4762034').bibtex)
@misc{zenodo-4762034,
author = {Hammarström, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},
title = {glottolog/glottolog: Glottolog database 4.4 as CLDF},
keywords = {cldf:StructureDataset, linguistics},
publisher = {Zenodo},
year = {2021},
doi = {10.5281/zenodo.4762034},
url = {https://doi.org/10.5281/zenodo.4762034},
copyright = {Creative Commons Attribution 4.0}
}
One can download the full deposit (and access - possible multiple - CLDF datasets):
from pycldf import iter_datasets
API.get_record(doi='...').download('my_directory')
for cldf in iter_datasets('my_directory'):
pass
But often, only the "pure" CLDF data is of interest - and not the additional metadata and curation context, e.g. of cldfbench-curated datasets. This can be done via
cldf = API.get_record(doi='...').download_dataset('my_directory')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cldfzenodo-2.2.1.tar.gz.
File metadata
- Download URL: cldfzenodo-2.2.1.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1febff3747b541c0c463ac102436afd5d2e13969d7696ddb1363fbbea6f838e2
|
|
| MD5 |
f9194be1b502dd809fa4341dbaa6dd8e
|
|
| BLAKE2b-256 |
61d7fbdf147a3831264318a235752413de55d4d1c4e48592ef27f17278d5ec7e
|
File details
Details for the file cldfzenodo-2.2.1-py2.py3-none-any.whl.
File metadata
- Download URL: cldfzenodo-2.2.1-py2.py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a126d9833eff668f6fa9ec08ba5892bec1d41d3580567a3e6ce5fbad478a4ea
|
|
| MD5 |
1446f29687eda96e481a4e3215e7def7
|
|
| BLAKE2b-256 |
206f38b615a5e7f4b79558dae65d2be9cbc42e87708c4a10a12bf06afb70aa89
|