Skip to main content

A python library to read and write CLDF datasets

Project description

pycldf

A python package to read and write CLDF datasets.

Build Status codecov Requirements Status PyPI

Reading CLDF

>>> from pycldf.dataset import Dataset
>>> dataset = Dataset.from_metadata('mydataset/Wordlist-metadata.json')
>>> print(dataset)
<cldf:v1.0:Wordlist at mydataset>

# what is the type of dataset?
>>> print(dataset.module)
'Wordlist'

# iterate over forms:
>>> for form in dataset['FormTable']:
>>>    print(form)
>>> [('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])]
...

# or get all of them
>>> forms = list(dataset['FormTable'])
>>> forms[0]
OrderedDict([('ID', '1'), ('Language_ID', 'abcd1234'), ('Parameter_ID', '1277'), ('Value', 'word'), ('Segments', []), ('Comment', None), ('Source', ['Meier2005[3-7]'])])

# references
>>> refs = list(dataset.sources.expand_refs(forms[0]['Source']))
>>> refs
[<Reference Meier2005[3-7]>]
>>> print(refs[0].source)
Meier, Hans. 2005. The Book.

Writing CLDF

from pycldf import Wordlist, Source

dataset = Wordlist.in_dir('mydataset')
dataset.add_sources(Source('book', 'Meier2005', author='Hans Meier', year='2005', title='The Book'))
dataset.write(FormTable=[
    {
        'ID': '1', 
        'Form': 'word', 
        'Language_ID': 'abcd1234', 
        'Parameter_ID': '1277', 
        'Source': ['Meier2005[3-7]'],
    }])

results in

$ ls -1 mydataset/
forms.csv
sources.bib
Wordlist-metadata.json
  • mydataset/forms.csv
ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
  • mydataset/sources.bib
@book{Meier2005,
    author = {Meier, Hans},
    year = {2005},
    title = {The Book}
}
  • mydataset/Wordlist-metadata.json

Advanced writing

To add predefined CLDF components to a dataset, use the add_component method:

from pycldf import StructureDataset, term_uri

dataset = StructureDataset.in_dir('mydataset')
dataset.add_component('ParameterTable')
dataset.write(
    ValueTable=[{'ID': '1', 'Language_ID': 'abc', 'Parameter_ID': '1', 'Value': 'x'}],
	ParameterTable=[{'ID': '1', 'Name': 'Grammatical Feature'}])

It is also possible to add generic tables:

dataset.add_table('contributors.csv', term_uri('id'), term_uri('name'))

which can also be linked to other tables:

dataset.add_columns('ParameterTable', 'Contributor_ID')
dataset.add_foreign_key('ParameterTable', 'Contributor_ID', 'contributors.csv', 'ID')

Addressing tables and columns

Tables in a dataset can be referenced using a Dataset's __getitem__ method, passing

  • a full CLDF Ontology URI for the corresponding component,
  • the local name of the component in the CLDF Ontology,
  • the url of the table.

Columns in a dataset can be referenced using a Dataset's __getitem__ method, passing a tuple (<TABLE>, <COLUMN>) where <TABLE> specifies a table as explained above and <COLUMN> is

  • a full CLD Ontolgy URI used as propertyUrl of the column,
  • the name property of the column.

Object oriented access to CLDF data

The pycldf.orm module implements functionality to access CLDF data via an ORM. Read its docstring for details.

Accessing CLDF data via SQL

The pycldf.db module implements functionality to load CLDF data into a SQLite database. Read its docstring for details.

Command line usage

Installing the pycldf package will also install a command line interface cldf, which provides some sub-commands to manage CLDF datasets.

Summary statistics

$ cldf stats mydataset/Wordlist-metadata.json 
<cldf:v1.0:Wordlist at mydataset>

Path                   Type          Rows
---------------------  ----------  ------
forms.csv              Form Table       1
mydataset/sources.bib  Sources          1

Validation

By default, data files are read in strict-mode, i.e. invalid rows will result in an exception being raised. To validate a data file, it can be read in validating-mode.

For example the following output is generated

$ cldf validate mydataset/forms.csv
WARNING forms.csv: duplicate primary key: (u'1',)
WARNING forms.csv:4:Source missing source key: Mei2005

when reading the file

ID,Language_ID,Parameter_ID,Value,Segments,Comment,Source
1,abcd1234,1277,word,,,Meier2005[3-7]
1,stan1295,1277,hand,,,Meier2005[3-7]
2,stan1295,1277,hand,,,Mei2005[3-7]

Converting a CLDF dataset to an SQLite database

A very useful feature of CSVW in general and CLDF in particular is that it provides enough metadata for a set of CSV files to load them into a relational database - including relations between tables. This can be done running the cldf createdb command:

$ cldf createdb -h
usage: cldf createdb [-h] [--infer-primary-keys] DATASET SQLITE_DB_PATH

Load a CLDF dataset into a SQLite DB

positional arguments:
  DATASET               Dataset specification (i.e. path to a CLDF metadata
                        file or to the data file)
  SQLITE_DB_PATH        Path to the SQLite db file

For a specification of the resulting database schema refer to the documentation in src/pycldf/db.py.

See also

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycldf-1.18.0.tar.gz (41.3 kB view details)

Uploaded Source

Built Distribution

pycldf-1.18.0-py2.py3-none-any.whl (50.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pycldf-1.18.0.tar.gz.

File metadata

  • Download URL: pycldf-1.18.0.tar.gz
  • Upload date:
  • Size: 41.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12

File hashes

Hashes for pycldf-1.18.0.tar.gz
Algorithm Hash digest
SHA256 256f8c0bbe6b2529a478d3531c3eb5cfa22d60803a7b5442df50bd61f20b2657
MD5 0656d7d214f7c8ac40affc282e484a9c
BLAKE2b-256 3c7df11159fe95ef82e877fdf710314f0e9a7d722d58947664aa6013f9f4d7fe

See more details on using hashes here.

File details

Details for the file pycldf-1.18.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pycldf-1.18.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 50.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12

File hashes

Hashes for pycldf-1.18.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8842fe534c485f289bd722aac06b2f7692b729ad238e617588498c578cf168f3
MD5 5a37cb6cd3c62bb911e8f321e1661ba2
BLAKE2b-256 bd203626821bd194af1754fafb8f67b7e906a373295a3b9bcf1875c2dba1c4c4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page