Python LMF library
Latest version: 1.0
Date: October 21, 2015
Author: Céline Buret
Maintainer: Séverine Guillaume
Home page: https://github.com/buret/pylmflib
Platform: Unix, Linux, Windows, MAC
Package index owner: Céline Buret
The Python LMF library is a suite of open-source Python modules for dictionary format conversion. It performs automatic tasks for multi-languages dictionaries, such as conversion between different formats used for dictionaries.
The main idea of pylmflib is to provide a software package which integrates conversion functions from MDF format to several output formats: LaTeX (PDF), docx, HTML, etc.
pylmflib implements the LMF standard. For more details, please see http://www.lexicalmarkupframework.org.
pylmflib is a library written in the Python programming language. It can be used directly in the Python interpreter or imported into Python scripts. For more information about Python, see http://www.python.org.
If you are using pylmflib for non-commercial, scientific projects, please cite the library in its current state along with the version that you used:
Buret, Céline (2015): pylmflib. Python Library for Automatic Tasks in Multi-Languages Dictionaries. Version 1.0 (Uploaded on 2015-10-21). URL: http://www.pylmflib.org.
Use pip to install pylmflib package from PyPI:
$ pip install pylmflib
In order to use the library, open Python2 in your terminal and import pylmflib as follows:
>>> from pylmflib import *
Here is the list of the libraries without which pylmflib won’t work.
If you want to regularly work on pylmflib, open a (git) terminal and type in the following:
$ git clone https://github.com/buret/pylmflib
Before being able to run pylmflib, you will need to follow these steps:
$ sudo apt-get install git $ git clone https://github.com/buret/pylmflib pylmflib
$ wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python
Download python-docx-0.8.5.tar.gz : https://pypi.python.org/pypi/python-docx
$ tar xvzf python-docx-0.8.5.tar.gz $ cd python-docx-0.8.5/ $ sudo python setup.py install
$ sudo apt-get install xsltproc
$ sudo apt-get install texlive $ sudo apt-get install texlive-xetex
We recommend to use the stable version of pylmflib (1.0). Make sure that regex is installed on you system prior to installing pylmflib. In order to install this version, simply download it from https://github.com/buret/pylmflib or https://pypi.python.org/pypi/pylmflib/1.0, unpack the directory, then cd into it, and type in the prompt:
$ python setup.py install
You may need sudo-rights to carry out these command.
At this stage, you can run the unit tests:
And you could run all provided examples:
$ examples/Bambara/bambara.py $ examples/japhug/dict_japhug.py $ examples/khaling/dict_khaling.py $ examples/na/dict_na.py $ examples/test/scenario.py $ examples/yuanga/dict_yuanga.py
Before being able to install pylmflib-1.0, you will need to install:
In some cases, you may need to install:
The current version of pylmflib for Python2 should basically also run on Windows. In order to install pylmflib on a Windows machine, I recommend to use the Cygwin terminal and install pylmflib in the same way in which one would otherwise install it on Linux or Mac machines.
To use the library without installing it, i.e. without running the setup-command, a simple way to use pylmflib is to include it in your sys-path just before you call the library:
>>> import sys >>> sys.path.append("path_to_pylmflib)
Source code is available at: https://github.com/buret/pylmflib
pylmflib has been developed in Python 2.7.5.
It is under GPL licence.
In the following, we list some of the formats that are frequently used by pylmflib, be it that they are taken as input formats, or that they are produced as output from the classes and methods provided by pylmflib:
Here is a list of formats that can be used, but need to be further developed, i.e. integration has been done but implementation has to be completed:
Formats that have to be added to the library in the future:
Please respect the coding rules used in the library.
For tests, I use the unittest``Python library. To run the tests, just enter the main directory and call ``test/test_all.py on the command line. Please do not commit any changes without all tests running without failure or error.
All tests are in a directory test/ within the main directory. For each Python source file in the source directory, there is a test file with a prefix test_. For example, the tests of the core module, which has its source in pylmflib/core/, are located in test/test_core_xxx.py. Within the test files, there is a class defined for each class in the original source files, with a prefix Test. For example, there is a class TestLexicalEntry defined in test_core_lexical_entry.py as there is a class LexicalEntry in lexical_entry.py. For each method of a class, the test class has a method with the prefix test_. For example, the method create_related_form() of the LexicalEntry class is tested with the method test_create_related_form() of the test class.
If you contribute to pylmflib, you should document your code. The first step for documentation is the documentation within the code.
Currently, documentation is created using the following steps:
This is an example workflow that illustrates some of the functionalities of pylmflib. We start with a small dataset from the Bambara language.
First, make sure to have the Python LMF library downloaded, extracted and installed properly. The dataset that will be used is located under examples/Bambara.
This folder includes a Python script that runs the whole code from the beginning to the end. In order to start the conversion, go under the main directory and run this script:
$ python examples/Bambara/bambara.py
As a result, the following files will appear in the result directory:
You can also directly run the conversion and XeLaTeX command by running bambara.sh or bambara.bat depending on your operating system.
It is the main script, the one which calls pylmflib functions:
So the basic steps are:
In this script, user also has access to all pylmflib objects methods, which are fully documented at: http://himalco.huma-num.fr/documentation/index.htm
To be able to customise some Python variables, it is possible to write a setting.py file, in which user can:
- define the items to sort: in this case, we choose to sort the lx MDF marker contents, but it could be any other field ;
- customise input MDF markers used by modifying the mdf_lmf Pyhton variable ;
- customise output MDF markers by modifying the lmf_mdf Python variable.
It is also possible to customise Python functions. See the other examples below for more advanced use.
This file is needed to define working path and path to the library. Normally, you should not have to modify it.
A simple example is presented under examples/test. All available output formats are generated:
- XML LMF
- XML TEI
Note that conversion scripts from XML LMF to HTML, ODT and XML TEI are here as examples to show what is possible to do. They have to be reworked to generate user-friendly outputs.
It is possible to fully customise the desired output. There are three examples to generate customised PDF printable dictionaries, located under examples/japhug, examples/khaling and examples/na.
In all cases, the file setting.py has been deeply modified. The most important function is lmf2tex(), which role is to organise data information in the LaTeX output file. If user do not provide this Python function, there is a default function for basic presentation. Again, coding details about this function is available at: http://himalco.huma-num.fr/documentation/index.htm
It is also possible to customise a document output. There is an example to generate a customised docx editable dictionary, located under examples/yuanga.
Moreover, in this case, entries are not classified by alphabetical order, but by semantic domain.
Chapter titles of the output docx document are defined in setting.py, with order then sd_order variables.
Moreover, part of speech authorised values have been deeply extended by modifying the ps_partOfSpeech Python variable.
This part is an overview of the configuration files you may have to customise.
The root element is named Config. It contains following elements that user has to set.
Language: define the vernacular, national, regional and other languages that you have to use in your multi-languages dictionary, by setting the ISO-639-3 code value (usually composed of three letters).
Font: define fonts to use for LaTeX output format if needed ; for each defined language, a font has to be defined using LaTeX commands.
LMF: define GlobalInformation and Lexicon attributes of LexicalResource (author, version, dictionary description and title, identifier, etc.) ; among these settings, two are very important to define: entrySource must point to the dictionary MDF input file, and localPath must point to the folder where your audio files are located if you have any.
MDF: here you can define your own part of speech values if you do not use standard ones defined in MDF.
LaTeX: not implemented.
If user wants to insert an introduction in his dictionary, here is the file to write it. It has to use LaTeX commands.
This file is used to define all LaTeX packages that will be needed to compile your LaTeX output file. You have to update it if you customise the lmf2tex() function by using non-basic LaTeX commands.
If you want your dictionary classified by a specific alphabetical order or if you use IPA or special characters, you have to write your own sort_order.xml file. Format is simple: for each character, you have to define a rank value.
For any of the settings defined above, please refer to examples for the exact syntax to respect.
The library provides several options. There are all described in the help menu, that you can display by running for instance:
$ python examples/Bambara/bambara.py -h
While running your Python script, you may notice that lots of warning messages are generated by the library. Indeed, all values that are not defined in your configuration files or allowed by the MDF or LMF standards are reported, as part of speech and paradigm label values. Note that it does not block the script execution. The library also reports unresolved cross references and sound files that are not found.
Any error will raise a Python exception, giving some details about the cause.