Skip to main content

A python library to read and write CLDF datasets

Project description

pycldf

A python package to read and write CLDF datasets.

Build Status codecov Requirements Status PyPI

Writing CLDF

from pycldf import Wordlist, Source

dataset = Wordlist.in_dir('mydataset')
dataset.add_sources(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.write(FormTable=[
    {
        'ID': '1', 
        'Form': 'word', 
        'Language_ID': 'abcd1234', 
        'Parameter_ID': '1277', 
        'Source': ['Meier2005[3-7]'],
    }])

results in

$ ls -1 mydataset/
forms.csv
sources.bib
Wordlist-metadata.json
  • mydataset/forms.csv
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
  • mydataset/sources.bib
@book{Meier2005,
    author = {Meier, Hans},
    year = {2005},
    title = {The Book}
}
  • mydataset/Wordlist-metadata.json

Advanced writing

To add predefined CLDF components to a dataset, use the add_component method:

from pycldf import StructureDataset, term_uri

dataset = StructureDataset.in_dir('mydataset')
dataset.add_component('ParameterTable')
dataset.write(
    ValueTable=[{'ID': '1', 'Language_ID': 'abc', 'Parameter_ID': '1', 'Value': 'x'}],
	ParameterTable=[{'ID': '1', 'Name': 'Grammatical Feature'}])

It is also possible to add generic tables:

dataset.add_table('contributors.csv', term_uri('id'), term_uri('name'))

which can also be linked to other tables:

dataset.add_columns('ParameterTable', 'Contributor_ID')
dataset.add_foreign_key('ParameterTable', 'Contributor_ID', 'contributors.csv', 'ID')

Addressing tables and columns

Tables in a dataset can be referenced using a Dataset's __getitem__ method, passing

  • a full CLDF Ontology URI for the corresponding component,
  • the local name of the component in the CLDF Ontology,
  • the url of the table.

Columns in a dataset can be referenced using a Dataset's __getitem__ method, passing a tuple (<TABLE>, <COLUMN>) where <TABLE> specifies a table as explained above and <COLUMN> is

  • a full CLD Ontolgy URI used as propertyUrl of the column,
  • the name property of the column.

Reading CLDF

>>> from pycldf.dataset import Wordlist
>>> dataset = Wordlist.from_metadata('mydataset/Wordlist-metadata.json')
>>> print(dataset)
<cldf:v1.0:Wordlist at mydataset>
>>> forms = list(dataset['FormTable'])
>>> forms[0]
OrderedDict([('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])])
>>> refs = list(dataset.sources.expand_refs(forms[0]['Source']))
>>> refs
[<Reference Meier2005[3-7]>]
>>> print(refs[0].source)
Meier, Hans. 2005. The Book.

Command line usage

Installing the pycldf package will also install a command line interface cldf, which provides some sub-commands to manage CLDF datasets.

Summary statistics

$ cldf stats mydataset/Wordlist-metadata.json 
<cldf:v1.0:Wordlist at mydataset>

Path                   Type          Rows
---------------------  ----------  ------
forms.csv              Form Table       1
mydataset/sources.bib  Sources          1

Validation

By default, data files are read in strict-mode, i.e. invalid rows will result in an exception being raised. To validate a data file, it can be read in validating-mode.

For example the following output is generated

$ cldf validate mydataset/forms.csv
WARNING forms.csv: duplicate primary key: (u'1',)
WARNING forms.csv:4:Source missing source key: Mei2005

when reading the file

ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
1,stan1295,1277,hand,,,Meier2005[3-7]
2,stan1295,1277,hand,,,Mei2005[3-7]

See also

Project details


Release history Release notifications

This version
History Node

1.4.1

History Node

1.4.0

History Node

1.3.0

History Node

1.2.0

History Node

1.1.1

History Node

1.1.0

History Node

1.0.10

History Node

1.0.9

History Node

1.0.8

History Node

1.0.7

History Node

1.0.6

History Node

1.0.5

History Node

1.0.4

History Node

1.0.3

History Node

1.0.2

History Node

1.0.1

History Node

1.0r2

History Node

1.0r1

History Node

1.0.0

History Node

1.0rc1

History Node

1.0b2

History Node

0.6.4

History Node

0.6.3

History Node

0.6.2

History Node

0.6.1

History Node

0.6.0

History Node

0.5.2

History Node

0.5.1

History Node

0.5.0

History Node

0.4.2

History Node

0.4.1

History Node

0.4.0

History Node

0.3.0

History Node

0.2.1

History Node

0.2.0

History Node

0.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pycldf-1.4.1-py2.py3-none-any.whl (41.2 kB) Copy SHA256 hash SHA256 Wheel py2.py3 May 2, 2018
pycldf-1.4.1.tar.gz (31.2 kB) Copy SHA256 hash SHA256 Source None May 2, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page