A python library to read and write CLDF datasets
Project description
pycldf
A python package to read and write CLDF datasets.
Writing CLDF
from pycldf import Wordlist, Source
dataset = Wordlist.in_dir('mydataset')
dataset.add_sources(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.write(FormTable=[
{
'ID': '1',
'Form': 'word',
'Language_ID': 'abcd1234',
'Parameter_ID': '1277',
'Source': ['Meier2005[3-7]'],
}])
results in
$ ls -1 mydataset/
forms.csv
sources.bib
Wordlist-metadata.json
mydataset/forms.csv
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
mydataset/sources.bib
@book{Meier2005,
author = {Meier, Hans},
year = {2005},
title = {The Book}
}
mydataset/Wordlist-metadata.json
Advanced writing
To add predefined CLDF components to a dataset, use the add_component method:
from pycldf import StructureDataset, term_uri
dataset = StructureDataset.in_dir('mydataset')
dataset.add_component('ParameterTable')
dataset.write(
ValueTable=[{'ID': '1', 'Language_ID': 'abc', 'Parameter_ID': '1', 'Value': 'x'}],
ParameterTable=[{'ID': '1', 'Name': 'Grammatical Feature'}])
It is also possible to add generic tables:
dataset.add_table('contributors.csv', term_uri('id'), term_uri('name'))
which can also be linked to other tables:
dataset.add_columns('ParameterTable', 'Contributor_ID')
dataset.add_foreign_key('ParameterTable', 'Contributor_ID', 'contributors.csv', 'ID')
Addressing tables and columns
Tables in a dataset can be referenced using a Dataset's __getitem__ method,
passing
- a full CLDF Ontology URI for the corresponding component,
- the local name of the component in the CLDF Ontology,
- the
urlof the table.
Columns in a dataset can be referenced using a Dataset's __getitem__ method,
passing a tuple (<TABLE>, <COLUMN>) where <TABLE> specifies a table as explained
above and <COLUMN> is
- a full CLD Ontolgy URI used as
propertyUrlof the column, - the
nameproperty of the column.
Reading CLDF
>>> from pycldf.dataset import Wordlist
>>> dataset = Wordlist.from_metadata('mydataset/Wordlist-metadata.json')
>>> print(dataset)
<cldf:v1.0:Wordlist at mydataset>
>>> forms = list(dataset['FormTable'])
>>> forms[0]
OrderedDict([('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])])
>>> refs = list(dataset.sources.expand_refs(forms[0]['Source']))
>>> refs
[<Reference Meier2005[3-7]>]
>>> print(refs[0].source)
Meier, Hans. 2005. The Book.
Command line usage
Installing the pycldf package will also install a command line interface cldf, which provides some sub-commands to manage CLDF datasets.
Summary statistics
$ cldf stats mydataset/Wordlist-metadata.json
<cldf:v1.0:Wordlist at mydataset>
Path Type Rows
--------------------- ---------- ------
forms.csv Form Table 1
mydataset/sources.bib Sources 1
Validation
By default, data files are read in strict-mode, i.e. invalid rows will result in an exception being raised. To validate a data file, it can be read in validating-mode.
For example the following output is generated
$ cldf validate mydataset/forms.csv
WARNING forms.csv: duplicate primary key: (u'1',)
WARNING forms.csv:4:Source missing source key: Mei2005
when reading the file
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
1,stan1295,1277,hand,,,Meier2005[3-7]
2,stan1295,1277,hand,,,Mei2005[3-7]
See also
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycldf-1.6.2.tar.gz.
File metadata
- Download URL: pycldf-1.6.2.tar.gz
- Upload date:
- Size: 30.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41ed797e88d318087b7af2935e1fa46d63a27a98613adb7c16bd3600f49558c4
|
|
| MD5 |
13198a670c129b1516953824b595d8bb
|
|
| BLAKE2b-256 |
8c4e5bcc6c152804a04c715ebece8fe3e40cb2b698a3a7b027eb0524a5b7f06e
|
File details
Details for the file pycldf-1.6.2-py2.py3-none-any.whl.
File metadata
- Download URL: pycldf-1.6.2-py2.py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59cbadbc69f3bc4813f81b8ea20f124187915bf4ac3b34a62e721b8271e0ac45
|
|
| MD5 |
f8dd9d804d7e3c16e26e7fae0c292e54
|
|
| BLAKE2b-256 |
708bd5e5a93708fd29024a9ff5186c13e8df87df5ad2c600bb7ae6c2a207cdbc
|