A cldfbench plugin to create vizualisations of CLDF datasets
Project description
cldfviz
Python library providing tools to visualize CLDF datasets.
Install
Run
pip install cldfviz
If you want create maps in image formats (PNG, JPG, PDF), the cartopy package is needed,
which will be installed with
pip install cldfviz[cartopy]
Note: Since cartopy has quite a few system-level requirements, installation may be somewhat tricky. Should
problems arise, https://scitools.org.uk/cartopy/docs/v0.15/installing.html may help.
CLI
cldfviz is implemented as cldfbench
plugin, i.e. it provides subcommands for the cldfbench command.
After installation you should see subcommands with a cldfviz. prefix
listed when running
cldfbench -h
cldfviz.map
A common way to visualize data from a CLDF StructureDataset is as "dots on a map", i.e. as WALS-like geographic maps.
This can be done using the cldfviz.map command. If you need to look up geo-coordinates
for languages in Glottolog (because the dataset you are interested in does not provide coordinates,
but has Glottocodes), this command needs
- access to a local clone or export of the glottolog/glottolog repository,
- Glottocodes for all languages in the set, either given as
languageReferencein theValueTableor asglottocodeinLanguageTable.
We'll explain the usage of the command by using it with the WALS CLDF data.
(Run cldfbench cldfviz.map -h to list all options of the command.)
You can download the WALS data - for example - using another cldfbench plugin: cldfzenodo:
cldfbench zenodo.download 10.5281/zenodo.4683137 --directory wals-2020.1/
HTML maps
With the leaflet library, we can create interactive maps which can be explored in a browser.
Running
cldfbench cldfviz.map wals-2020.1/StructureDataset-metadata.json --base-layer Esri_WorldPhysical --pacific-centered
will create an HTML page map.html and open it in the browser, thus rendering an interactive
map of the languages in the dataset.
For smaller language samples, it may be suitable to display the language names on the map, too. Here's WALS' feature 10B:
cldfbench cldfviz.map wals-2020.1/StructureDataset-metadata.json --parameters 10B --colormaps tol --markersize 20 --language-labels
cldfviz.map can detect and display continuous variables, too. There are no continuous features in WALS, but since
cldfviz.map also works with
metadata-free CLDF datasets, let's
quickly create one. Using the UNIX shell tools sed and awk and the
tools of thecsvkit toolbox, we
can run
csvgrep -c Latitude,Glottocode -r".+" wals-2020.1/languages.csv | \
csvcut -c ID,Glottocode,Latitude | \
awk '{if(NR==1){print $0",Parameter_ID"}else{print $0",latitude"}}' | \
sed 's/ID,Glottocode,Latitude,Parameter_ID/ID,Language_ID,Value,Parameter_ID/g' > values.csv
Let's break this down: The first line selects all WALS languages for which latitude and Glottocode is given.
The next line narrows the resulting CSV to just three columns - the future ID, Language_ID and Value
columns of our metadata-free StructureDataset. The awk command adds a constant column Parameter_ID,
and the sed command renames the columns appropriately.
The resulting CSV looks as follows:
$ head -n 4 values.csv
ID,Language_ID,Value,Parameter_ID
aar,aari1239,6,latitude
aba,abau1245,-4,latitude
abb,chad1249,13.8333333333,latitude
Now we can run
cldfbench cldfviz.map values.csv --parameters latitude --glottolog PATH/TO/GLOTTOLOG
Note that for metadata-free datasets, cldfviz.map needs to lookup coordinates in Glottolog. Thus, languages
may be displayed at slightly different locations than above (when the coordinates in WALS differ).
Now we could have done this in a simpler way, too, because cldfviz.map has a special option to display language
properties encoded as columns in the LanguageTable as if they were parameters of the dataset. We can use this
option to visualize a claim from WALS' chapter 129 that there is a
strong correlation between values [for feature 129] and latitudinal location
cldfbench cldfviz.map wals-2020.1/cldf/StructureDataset-metadata.json --parameters 129A --colormaps tol \
--markersize 20 --language-properties Latitude --pacific-centered
As seen above, cldfviz.map can visualize multiple parameters at once. E.g. we can explore the related WALS
features 129A, 130A and 130B, selecting suitable colormaps for the two boolean parameters:
cldfbench cldfviz.map wals-2020.1/cldf/StructureDataset-metadata.json --parameters 129A,130A,130B \
--colormaps base,base,tol --pacific-centered --markersize 30
Printable maps via cartopy
If cldfviz is installed with cartopy similar maps to the ones shown above can also be created
in various image formats:
cldfbench cldfviz.map wals-2020.1/StructureDataset-metadata.json --parameters 129A --colormaps tol \
--language-properties Latitude --pacific-centered \
--format jpg --width 20 --height 10 --dpi 300 --markersize 40
While these maps lack the interactivity of the HTML maps, they may be better suited for inclusion in print formats than screen shots of maps in the browser. They also provide some additional options like a choice between various map projections.
Advanced dataset pre-processing
Going one step further, we might visualize data that has been synthesized on the fly. E.g. we can visualize the AES endangerment information given in the Glottolog CLDF data for the WALS languages:
Since we will alter the WALS CLDF data, we make a copy of it first:
cp -r wals-2020.1 wals-copy
Now we extract the AES data from Glottolog ...
csvgrep -c Parameter_ID -m"aes" glottolog-cldf-4.4/cldf/values.csv |\
csvgrep -c Value -m"NA" -i |\
csvcut -c Language_ID,Parameter_ID,Code_ID > aes1.csv
... and massage it into a form that can be appended to the WALS ValueTable:
csvjoin -y 0 -c Glottocode,Language_ID wals-2020.1/cldf/languages.csv aes1.csv |\
csvcut -c Parameter_ID,Code_ID,ID |\
awk '{if(NR==1){print $0",ID"}else{print $0",aes-"NR}}' |\
sed 's/Parameter_ID,Code_ID,ID,ID/Parameter_ID,Value,Language_ID,ID/g' |\
csvcut -c ID,Language_ID,Parameter_ID,Value |\
awk '{if(NR==1){print $0",Code_ID,Comment,Source,Example_ID"}else{print $0",,,,"}}' > aes2.csv
Notes:
- The first
awkcall adds a unique valueID. We cannot re-use the valueIDfrom Glottolog, because the mapping between WALS and Glottolog languages is many-to-one. - Using
awkto manipulate CSV data is somewhat fragile, since it will break if the data contains multi-line cell content. To guard against that, you may compare the row count reported bycsvstatwith the line count fromwc -lbefore usingawk.
Now we append the values and a row for the ParameterTable ...
csvstack aes2.csv wals-copy/cldf/values.csv > values.csv
cp values.csv wals-copy/cldf
echo "ID,Name,Description,Chapter_ID" > aes_param.csv
echo "aes,AES,," >> aes_param.csv
csvstack aes_param.csv wals-copy/cldf/parameters.csv > parameters.csv
cp parameters.csv wals-copy/cldf
... and make sure the resulting dataset is valid:
cldf validate wals-copy/cldf/StructureDataset-metadata.json
Finally, we can plot the map:
cldfbench cldfviz.map wals-copy/cldf/StructureDataset-metadata.json --pacific-centered --colormaps seq --parameters aes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cldfviz-0.4.0.tar.gz.
File metadata
- Download URL: cldfviz-0.4.0.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3052bf4c4d62507c55e580e234aed675a371e2ef2b0186c812b25e64d257442
|
|
| MD5 |
4c66a119fd8652140aa29ba6a82021b3
|
|
| BLAKE2b-256 |
15408689846b09a3d4a443b0357d95b60edfe6434cfd73717da2f12a916daea7
|
File details
Details for the file cldfviz-0.4.0-py2.py3-none-any.whl.
File metadata
- Download URL: cldfviz-0.4.0-py2.py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03ad059bacc12dc2bc99e5c089a01e4167a52d18b1a9434ce28cc52457bbd6cf
|
|
| MD5 |
fd8785e969f7563c90405812f09653ed
|
|
| BLAKE2b-256 |
5a4120f2b214bb9031c338e521a97244865875d61ee0382a5b42d0ce390b9fd2
|