Skip to main content

A python library to read and write CLDF datasets

Project description

pycldf
======

A python package to read and write [CLDF](http://cldf.clld.org) datasets

[![Build Status](https://travis-ci.org/glottobank/pycldf.svg?branch=master)](https://travis-ci.org/glottobank/pycldf)
[![codecov](https://codecov.io/gh/glottobank/pycldf/branch/master/graph/badge.svg)](https://codecov.io/gh/glottobank/pycldf)
[![Requirements Status](https://requires.io/github/glottobank/pycldf/requirements.svg?branch=master)](https://requires.io/github/glottobank/pycldf/requirements/?branch=master)
[![PyPI](https://img.shields.io/pypi/v/pycldf.svg)](https://pypi.python.org/pypi/pycldf)


Writing CLDF
------------

```python
from pycldf.dataset import Dataset
from pycldf.sources import Source
dataset = Dataset('mydb')
dataset.fields = ('ID', 'Language_ID', 'Parameter_ID', 'Value', 'Source', 'Comment')
dataset.sources.add(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.add_row([
'1',
'http://glottolog.org/resource/languoid/id/stan1295',
'http://concepticon.clld.org/parameters/1277',
'hand',
'Meier2005[3-7]',
''])
dataset.write('.')
```

results in

- `mydb.csv`
```
ID,Language_ID,Parameter_ID,Value,Source,Comment
1,http://glottolog.org/resource/languoid/id/stan1295,http://concepticon.clld.org/parameters/1277,hand,Meier2005[3-7],
```
- `mydb.bib`
```bibtex
@book{Meier2005,
author = {Meier, Hans},
title = {The Book},
year = {2005}
}
```
- `mydb.csv-metadata.json`
```python
{
"@context": [
"http://www.w3.org/ns/csvw",
{
"@language": "en"
}
],
"dc:format": "cldf-1.0",
"dialect": {
"header": true,
"delimiter": ",",
"encoding": "utf-8"
},
"tables": [
{
"url": "",
"dc:type": "cldf-values",
"tableSchema": {
"primaryKey": "ID",
"columns": [
{
"datatype": "string",
"name": "ID"
},
{
"datatype": "string",
"name": "Language_ID"
},
{
"datatype": "string",
"name": "Parameter_ID"
},
{
"datatype": "string",
"name": "Value"
},
{
"datatype": "string",
"name": "Source"
},
{
"datatype": "string",
"name": "Comment"
}
]
}
}
]
}
```


Reading CLDF
------------

```python
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv')
>>> dataset
<Dataset mydb>
>>> len(dataset)
1
>>> row = dataset.rows[0]
>>> row
Row([('ID', u'1'),
('Language_ID', 'http://glottolog.org/resource/languoid/id/stan1295'),
('Parameter_ID', 'http://concepticon.clld.org/parameters/1277'),
('Value', 'hand'),
('Source', 'Meier2005[3-7]'),
('Comment', '')])
>>> row['Value']
'hand'
>>> row.refs
[<Reference Meier2005[3-7]>]
>>> row.refs[0].source
<Source Meier2005>
>>> print row.refs[0].source
Meier, Hans. 2005. The Book.
>>> print row.refs[0].source.bibtex()
@book{Meier2005,
year = {2005},
author = {Meier, Hans},
title = {The Book}
}
```


Validating a data file
~~~~~~~~~~~~~~~~~~~~~~

By default, data files are read in strict-mode, i.e. invalid rows will result in an exception
being raised. To validate a data file, it can be read in validating-mode.

For example the following output is generated

```python
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv', skip_on_error=True)
WARNING:pycldf.dataset:skipping row in line 3: wrong number of columns in row
WARNING:pycldf.dataset:skipping row in line 4: duplicate ID: 1
WARNING:pycldf.dataset:skipping row in line 5: missing citekey: Mei2005
```

when reading the file

```
ID,Language_ID,Parameter_ID,Value,Source,Comment
1,stan1295,1277,hand,Meier2005[3-7],
1,stan1295,1277,hand,Meier2005[3-7]
1,stan1295,1277,hand,Meier2005[3-7],
2,stan1295,1277,hand,Mei2005[3-7],
```


Support for augmented metadata
------------------------------

`pycldf` provides some support for metadata properties as described in
[W3's Metadata Vocabulary for Tabular Data](https://www.w3.org/TR/tabular-metadata/), in particular,
- On [column description level](https://www.w3.org/TR/tabular-metadata/#dfn-column-description),
- `datatype` is interpreted to use appropriate python objects internally,
- a URI template provided as `valueUrl` can be expanded calling `Row.valueUrl(<colname>)`.
- On [schema description level](https://www.w3.org/TR/tabular-metadata/#dfn-schema-description),
- a URI template provided as `aboutUrl` is used to compute the URL available as `Row.url`.

So the example above could be rewritten more succintly:

```python
from pycldf.dataset import Dataset
from pycldf.sources import Source
dataset = Dataset('mydb')
dataset.fields = ('ID', 'Language_ID', 'Parameter_ID', 'Value', 'Source', 'Comment')
dataset.table.schema.columns['ID'].datatype = int
dataset.table.schema.columns['Language_ID'].valueUrl = 'http://glottolog.org/resource/languoid/id/{Language_ID}'
dataset.table.schema.columns['Parameter_ID'].valueUrl = 'http://concepticon.clld.org/parameters/{Parameter_ID}'
dataset.sources.add(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.add_row(['1', 'stan1295', '1277', 'hand', 'Meier2005[3-7]', ''])
dataset.write('.')
```

And then accessed as follows:

```python
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_file('mydb.csv')
>>> row = dataset.rows[0]
>>> type(row['ID'])
<type 'int'>
>>> row.valueUrl('Language_ID')
'http://glottolog.org/resource/languoid/id/stan1295'
>>> row['Language_ID']
'stan1295'
```

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycldf-0.2.0.tar.gz (16.9 kB view details)

Uploaded Source

File details

Details for the file pycldf-0.2.0.tar.gz.

File metadata

  • Download URL: pycldf-0.2.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pycldf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6ac932e69b1195d3de30478fb5063dd3c3ca572234de7f390d48d523773b45f6
MD5 489aeb76db74b4f7f0ca90f6a0deb02b
BLAKE2b-256 3b067d0eaed2bdfca3d200aa2f94c35b7eba4600825fbfca3d4803927b47902a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page