A python library to read and write CLDF datasets
Project description
pycldf
A python package to read and write CLDF datasets.
Reading CLDF
>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_metadata('mydataset/Wordlist-metadata.json')
>>> print(dataset)
<cldf:v1.0:Wordlist at mydataset>
# what is the type of dataset?
>>> print(dataset.module)
'Wordlist'
# iterate over forms:
>>> for form in dataset['FormTable']:
>>> print(form)
>>> [('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])]
...
# or get all of them
>>> forms = list(dataset['FormTable'])
>>> forms[0]
OrderedDict([('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])])
# references
>>> refs = list(dataset.sources.expand_refs(forms[0]['Source']))
>>> refs
[<Reference Meier2005[3-7]>]
>>> print(refs[0].source)
Meier, Hans. 2005. The Book.
Writing CLDF
Warning: Writing CLDF with pycldf
does not automatically result in valid CLDF!
It does result in data that can be checked via cldf validate
(see below),
though, so you should always validate after writing.
from pycldf import Wordlist, Source
dataset = Wordlist.in_dir('mydataset')
dataset.add_sources(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.write(FormTable=[
{
'ID': '1',
'Form': 'word',
'Language_ID': 'abcd1234',
'Parameter_ID': '1277',
'Source': ['Meier2005[3-7]'],
}])
results in
$ ls -1 mydataset/
forms.csv
sources.bib
Wordlist-metadata.json
mydataset/forms.csv
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
mydataset/sources.bib
@book{Meier2005,
author = {Meier, Hans},
year = {2005},
title = {The Book}
}
mydataset/Wordlist-metadata.json
Advanced writing
To add predefined CLDF components to a dataset, use the add_component
method:
from pycldf import StructureDataset, term_uri
dataset = StructureDataset.in_dir('mydataset')
dataset.add_component('ParameterTable')
dataset.write(
ValueTable=[{'ID': '1', 'Language_ID': 'abc', 'Parameter_ID': '1', 'Value': 'x'}],
ParameterTable=[{'ID': '1', 'Name': 'Grammatical Feature'}])
It is also possible to add generic tables:
dataset.add_table('contributors.csv', term_uri('id'), term_uri('name'))
which can also be linked to other tables:
dataset.add_columns('ParameterTable', 'Contributor_ID')
dataset.add_foreign_key('ParameterTable', 'Contributor_ID', 'contributors.csv', 'ID')
Addressing tables and columns
Tables in a dataset can be referenced using a Dataset
's __getitem__
method,
passing
- a full CLDF Ontology URI for the corresponding component,
- the local name of the component in the CLDF Ontology,
- the
url
of the table.
Columns in a dataset can be referenced using a Dataset
's __getitem__
method,
passing a tuple (<TABLE>, <COLUMN>)
where <TABLE>
specifies a table as explained
above and <COLUMN>
is
- a full CLD Ontolgy URI used as
propertyUrl
of the column, - the
name
property of the column.
Object oriented access to CLDF data
The pycldf.orm
module implements functionality
to access CLDF data via an ORM. Read its docstring for
details.
Accessing CLDF data via SQL
The pycldf.db
module implements functionality
to load CLDF data into a SQLite database. Read its docstring for
details.
Command line usage
Installing the pycldf
package will also install a command line interface cldf
, which provides some sub-commands to manage CLDF datasets.
Summary statistics
$ cldf stats mydataset/Wordlist-metadata.json
<cldf:v1.0:Wordlist at mydataset>
Path Type Rows
--------------------- ---------- ------
forms.csv Form Table 1
mydataset/sources.bib Sources 1
Validation
By default, data files are read in strict-mode, i.e. invalid rows will result in an exception being raised. To validate a data file, it can be read in validating-mode.
For example the following output is generated
$ cldf validate mydataset/forms.csv
WARNING forms.csv: duplicate primary key: (u'1',)
WARNING forms.csv:4:Source missing source key: Mei2005
when reading the file
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
1,stan1295,1277,hand,,,Meier2005[3-7]
2,stan1295,1277,hand,,,Mei2005[3-7]
Converting a CLDF dataset to an SQLite database
A very useful feature of CSVW in general and CLDF in particular is that it
provides enough metadata for a set of CSV files to load them into a relational
database - including relations between tables. This can be done running the
cldf createdb
command:
$ cldf createdb -h
usage: cldf createdb [-h] [--infer-primary-keys] DATASET SQLITE_DB_PATH
Load a CLDF dataset into a SQLite DB
positional arguments:
DATASET Dataset specification (i.e. path to a CLDF metadata
file or to the data file)
SQLITE_DB_PATH Path to the SQLite db file
For a specification of the resulting database schema refer to the documentation in
src/pycldf/db.py
.
See also
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pycldf-1.20.0.tar.gz
.
File metadata
- Download URL: pycldf-1.20.0.tar.gz
- Upload date:
- Size: 47.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93fd7b2147cf29adaf08c89d2bfd645d5fc94b329f9251ade00ff8d34a03280b |
|
MD5 | 89e456c8de26e3793b17b63369378256 |
|
BLAKE2b-256 | 83bea77a6ece1e51612f5f132ea1df12fa1049ab79cfe507cb0a73c6668be241 |
File details
Details for the file pycldf-1.20.0-py2.py3-none-any.whl
.
File metadata
- Download URL: pycldf-1.20.0-py2.py3-none-any.whl
- Upload date:
- Size: 57.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d169229532e61f3f56d9ee66913da1e4aa992b9a5d82350516e1fa217fc2cac |
|
MD5 | 341b0de8c0fcfcb2d8c397e59fc2dc94 |
|
BLAKE2b-256 | 27b44fe37944a36d69e86e0f2d5a6f9d85013ef6bf91fe81ddc0d467f3bd8c0d |