A python library to read and write CLDF datasets
Project description
pycldf
A python package to read and write CLDF datasets.
Writing CLDF
from pycldf import Wordlist, Source
dataset = Wordlist.in_dir('mydataset')
dataset.add_sources(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.write(FormTable=[
{
'ID': '1',
'Form': 'word',
'Language_ID': 'abcd1234',
'Parameter_ID': '1277',
'Source': ['Meier2005[3-7]'],
}])
results in
$ ls -1 mydataset/
forms.csv
sources.bib
Wordlist-metadata.json
mydataset/forms.csv
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
mydataset/sources.bib
@book{Meier2005,
author = {Meier, Hans},
year = {2005},
title = {The Book}
}
mydataset/Wordlist-metadata.json
Advanced writing
To add predefined CLDF components to a dataset, use the add_component
method:
from pycldf import StructureDataset, term_uri
dataset = StructureDataset.in_dir('mydataset')
dataset.add_component('ParameterTable')
dataset.write(
ValueTable=[{'ID': '1', 'Language_ID': 'abc', 'Parameter_ID': '1', 'Value': 'x'}],
ParameterTable=[{'ID': '1', 'Name': 'Grammatical Feature'}])
It is also possible to add generic tables:
dataset.add_table('contributors.csv', term_uri('id'), term_uri('name'))
which can also be linked to other tables:
dataset.add_columns('ParameterTable', 'Contributor_ID')
dataset.add_foreign_key('ParameterTable', 'Contributor_ID', 'contributors.csv', 'ID')
Addressing tables and columns
Tables in a dataset can be referenced using a Dataset
's __getitem__
method,
passing
- a full CLDF Ontology URI for the corresponding component,
- the local name of the component in the CLDF Ontology,
- the
url
of the table.
Columns in a dataset can be referenced using a Dataset
's __getitem__
method,
passing a tuple (<TABLE>, <COLUMN>)
where <TABLE>
specifies a table as explained
above and <COLUMN>
is
- a full CLD Ontolgy URI used as
propertyUrl
of the column, - the
name
property of the column.
Reading CLDF
>>> from pycldf.dataset import Wordlist
>>> dataset = Wordlist.from_metadata('mydataset/Wordlist-metadata.json')
>>> print(dataset)
<cldf:v1.0:Wordlist at mydataset>
>>> forms = list(dataset['FormTable'])
>>> forms[0]
OrderedDict([('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])])
>>> refs = list(dataset.sources.expand_refs(forms[0]['Source']))
>>> refs
[<Reference Meier2005[3-7]>]
>>> print(refs[0].source)
Meier, Hans. 2005. The Book.
Command line usage
Installing the pycldf
package will also install a command line interface cldf
, which provides some sub-commands to manage CLDF datasets.
Summary statistics
$ cldf stats mydataset/Wordlist-metadata.json
<cldf:v1.0:Wordlist at mydataset>
Path Type Rows
--------------------- ---------- ------
forms.csv Form Table 1
mydataset/sources.bib Sources 1
Validation
By default, data files are read in strict-mode, i.e. invalid rows will result in an exception being raised. To validate a data file, it can be read in validating-mode.
For example the following output is generated
$ cldf validate mydataset/forms.csv
WARNING forms.csv: duplicate primary key: (u'1',)
WARNING forms.csv:4:Source missing source key: Mei2005
when reading the file
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
1,stan1295,1277,hand,,,Meier2005[3-7]
2,stan1295,1277,hand,,,Mei2005[3-7]
See also
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pycldf-1.14.0.tar.gz
.
File metadata
- Download URL: pycldf-1.14.0.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ad3e383e0fa0a40e3a7b298e1a543023718941a4477867265ea97429a21d2b3 |
|
MD5 | b14cf9e71fa0294fac3b3e17e195f1b6 |
|
BLAKE2b-256 | 866b5709780d28455968fc8a0249b41936062b7a4bf6a67e9432716eee74baaa |
File details
Details for the file pycldf-1.14.0-py2.py3-none-any.whl
.
File metadata
- Download URL: pycldf-1.14.0-py2.py3-none-any.whl
- Upload date:
- Size: 44.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99e3098fe4ef8c8ba3dccbbafcff61bb57aba19502eaa2db882fd206c5afede7 |
|
MD5 | 53b2670ed816c062a60238814b30ad1b |
|
BLAKE2b-256 | 28dee037daf521df33f7d23a9c3ddedfe71e1a5e691b695f868d3b1626209a85 |