Skip to main content

A cldfbench plugin to create vizualisations of CLDF datasets

Project description

cldfviz

Build Status PyPI

Python library providing tools to visualize CLDF datasets.

Install

Run

pip install cldfviz

If you want create maps in image formats (PNG, JPG, PDF), the cartopy package is needed, which will be installed with

pip install cldfviz[cartopy]

Note: Since cartopy has quite a few system-level requirements, installation may be somewhat tricky. Should problems arise, https://scitools.org.uk/cartopy/docs/v0.15/installing.html may help.

CLI

cldfviz is implemented as cldfbench plugin, i.e. it provides subcommands for the cldfbench command.

After installation you should see subcommands with a cldfviz. prefix listed when running

cldfbench -h

cldfviz.map

A common way to visualize data from a CLDF StructureDataset is as "dots on a map", i.e. as WALS-like geographic maps.

This can be done using the cldfviz.map command. If you need to look up geo-coordinates for languages in Glottolog (because the dataset you are interested in does not provide coordinates, but has Glottocodes), this command needs

We'll explain the usage of the command by using it with the WALS CLDF data. (Run cldfbench cldfviz.map -h to list all options of the command.) You can download the WALS data - for example - using another cldfbench plugin: cldfzenodo:

cldfbench zenodo.download 10.5281/zenodo.4683137 --directory wals-2020.1/

HTML maps

With the leaflet library, we can create interactive maps which can be explored in a browser.

Running

cldfbench cldfviz.map wals-2020.1/StructureDataset-metadata.json --base-layer Esri_WorldPhysical --pacific-centered

will create an HTML page map.html and open it in the browser, thus rendering an interactive map of the languages in the dataset.

WALS languages

For smaller language samples, it may be suitable to display the language names on the map, too. Here's WALS' feature 10B:

cldfbench cldfviz.map wals-2020.1/StructureDataset-metadata.json --parameters 10B --colormaps tol --markersize 20 --language-labels

WALS 10B

cldfviz.map can detect and display continuous variables, too. There are no continuous features in WALS, but since cldfviz.map also works with metadata-free CLDF datasets, let's quickly create one. Using the UNIX shell tools sed and awk and the tools of thecsvkit toolbox, we can run

csvgrep -c Latitude,Glottocode -r".+" wals-2020.1/languages.csv | \
csvcut -c ID,Glottocode,Latitude | \
awk '{if(NR==1){print $0",Parameter_ID"}else{print $0",latitude"}}' | \
sed 's/ID,Glottocode,Latitude,Parameter_ID/ID,Language_ID,Value,Parameter_ID/g' > values.csv

Let's break this down: The first line selects all WALS languages for which latitude and Glottocode is given. The next line narrows the resulting CSV to just three columns - the future ID, Language_ID and Value columns of our metadata-free StructureDataset. The awk command adds a constant column Parameter_ID, and the sed command renames the columns appropriately.

The resulting CSV looks as follows:

$ head -n 4 values.csv 
ID,Language_ID,Value,Parameter_ID
aar,aari1239,6,latitude
aba,abau1245,-4,latitude
abb,chad1249,13.8333333333,latitude

Now we can run

cldfbench cldfviz.map values.csv --parameters latitude --glottolog PATH/TO/GLOTTOLOG

WALS latitudes

Note that for metadata-free datasets, cldfviz.map needs to lookup coordinates in Glottolog. Thus, languages may be displayed at slightly different locations than above (when the coordinates in WALS differ).

Now we could have done this in a simpler way, too, because cldfviz.map has a special option to display language properties encoded as columns in the LanguageTable as if they were parameters of the dataset. We can use this option to visualize a claim from WALS' chapter 129 that there is a

strong correlation between values [for feature 129] and latitudinal location

cldfbench cldfviz.map wals-2020.1/cldf/StructureDataset-metadata.json --parameters 129A --colormaps tol \
--markersize 20 --language-properties Latitude --pacific-centered

WALS 129A and latitude

As seen above, cldfviz.map can visualize multiple parameters at once. E.g. we can explore the related WALS features 129A, 130A and 130B, selecting suitable colormaps for the two boolean parameters:

cldfbench cldfviz.map wals-2020.1/cldf/StructureDataset-metadata.json --parameters 129A,130A,130B \
--colormaps base,base,tol --pacific-centered --markersize 30 

WALS 129A, 130A and 130B

Printable maps via cartopy

If cldfviz is installed with cartopy similar maps to the ones shown above can also be created in various image formats:

cldfbench cldfviz.map wals-2020.1/StructureDataset-metadata.json --parameters 129A --colormaps tol \
--language-properties Latitude --pacific-centered \
--format jpg --width 20 --height 10 --dpi 300 --markersize 40

WALS 129A and latitude

While these maps lack the interactivity of the HTML maps, they may be better suited for inclusion in print formats than screen shots of maps in the browser. They also provide some additional options like a choice between various map projections.

Advanced dataset pre-processing

Going one step further, we might visualize data that has been synthesized on the fly. E.g. we can visualize the AES endangerment information given in the Glottolog CLDF data for the WALS languages:

Since we will alter the WALS CLDF data, we make a copy of it first:

cp -r wals-2020.1 wals-copy

Now we extract the AES data from Glottolog ...

csvgrep -c Parameter_ID -m"aes" glottolog-cldf-4.4/cldf/values.csv |\
csvgrep -c Value -m"NA" -i |\
csvcut -c Language_ID,Parameter_ID,Code_ID  > aes1.csv

... and massage it into a form that can be appended to the WALS ValueTable:

csvjoin -y 0 -c Glottocode,Language_ID wals-2020.1/cldf/languages.csv aes1.csv |\
csvcut -c Parameter_ID,Code_ID,ID |\
awk '{if(NR==1){print $0",ID"}else{print $0",aes-"NR}}' |\
sed 's/Parameter_ID,Code_ID,ID,ID/Parameter_ID,Value,Language_ID,ID/g' |\
csvcut -c ID,Language_ID,Parameter_ID,Value |\
awk '{if(NR==1){print $0",Code_ID,Comment,Source,Example_ID"}else{print $0",,,,"}}' > aes2.csv

Notes:

  • The first awk call adds a unique value ID. We cannot re-use the value ID from Glottolog, because the mapping between WALS and Glottolog languages is many-to-one.
  • Using awk to manipulate CSV data is somewhat fragile, since it will break if the data contains multi-line cell content. To guard against that, you may compare the row count reported by csvstat with the line count from wc -l before using awk.

Now we append the values and a row for the ParameterTable ...

csvstack aes2.csv wals-copy/cldf/values.csv > values.csv
cp values.csv wals-copy/cldf
echo "ID,Name,Description,Chapter_ID" > aes_param.csv
echo "aes,AES,," >> aes_param.csv
csvstack aes_param.csv wals-copy/cldf/parameters.csv > parameters.csv
cp parameters.csv wals-copy/cldf

... and make sure the resulting dataset is valid:

cldf validate wals-copy/cldf/StructureDataset-metadata.json

Finally, we can plot the map:

cldfbench cldfviz.map wals-copy/cldf/StructureDataset-metadata.json --pacific-centered --colormaps seq --parameters aes

WALS AES

cldfviz.text

A rather traditional visualization of linguistic data is the practice of interspersing bits of data in descriptive texts, most obviously perhaps as examples formatted as Interlinear Glossed Text. Other examples of data in text include forms, either in running text or in a table.

To support this use case, the cldfviz.text command can fill data from a CLDF dataset into a markdown document, where references to CLDF data objects (rows of tables or complete tables) are marked using the markdown link format with a special URL syntax. To reference a single row:

[An arbitrary label](some/path/<component-name-or-csv-filename>#cldf:<obect-id>)

To reference a whole table:

[An arbitrary label](some/path/<component-name-or-csv-filename>#cldf:__all__)

Note: Only the last component of the URL path is used to determine a CLDF component or table of the dataset, while the first part is ignored. This allows using URLs that are even somewhat functional in the unrendered document. E.g.

[Meier 2020](cldf/sources.bib#cldf:Meier2020)

will render as Meier 2020, linking to the BibTeX file when the document is simply rendered as markdown by a service like GitHub, while the enhanced document created from cldfviz.text will replace the link with the reference data expanded to a full citation according to the Unified Stylesheet for Linguistics.

Rendering of data objects is controled with templates using the Jinja template language. Sometimes, templates can be parametrized, e.g. to choose only cognates belonging to the same cognate set from a CognateTable. These parameters can be specified as query string of the reference URL, e.g.

[cognateset X](some/path/CognateTable?cognatesetReference=X#cldf:__all__)

In addition to data objects you can also specify maps to be created with cldfviz.map and included in the resulting markdown document; e.g.:

![](map.jpg?parameters=1A#cldfviz.map)

An example of a document rendered with cldfviz.text is docs/text_example/README.md, several paragraphs of WALS' chapter 21, rewritten in "CLDF markdown" and rendered by "filling in" data from WALS as CLDF dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cldfviz-0.6.0.tar.gz (35.3 kB view hashes)

Uploaded Source

Built Distribution

cldfviz-0.6.0-py2.py3-none-any.whl (41.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page