Extensible web application for exploring natural languages
Extensible web application for exploring natural languages
Natural languages are as sublime as exquisite flowers in a garden - and from such a naturalistic simile stems the name of this web application: Jardinero, meaning gardener.
I definitely needed a tool to perform morphological analysis over the Spanish language - that is, I wanted to find an answer to questions like:
Why some Spanish words end with -tad, whereas others end with -dad? What are the differences between them, in terms of both morphology and cardinality?
To solve this mystery - and several more - I decided to create Jardinero, a web application extracting my compact SQLite Spanish dictionary from Wikcionario, ready for custom SQL queries.
While developing the project, I felt it would be nice to extend the approach to any language, thus creating the whole open source architecture consisting of:
Eos-core - type-checked, dependency-free utility library for modern Python
WikiPrism - library for parsing wiki pages and creating dictionaries
Cervantes - WikiPrism-based library extracting a compact Spanish dictionary from Wikcionario
Jardinero: hybrid Python/TypeScript web application, with a Flask backend and a React frontend communicating via websockets
As a core aspect, the architecture can be easily extended by creating Python modules and packages named linguistic modules.
Jardinero's user interface enables users to:
create a SQLite dictionary from a wiki file - whose URL depends on the current linguistic module
perform queries - in SQL or even in a custom DSL - upon the internal dictionary
re-create the dictionary, especially when the data source gets frequent updates
Presentation on SpeakerDeck
To explore in detail how the overall architecture works, as well as the purpose and the creation process of its components, please consult my presentation on SpeakerDeck: The making of Jardinero.
Jardinero requires at least Python 3.10 - available at Python's official website or via your operating system's package manager.
You can install Jardinero just like any other PyPI package for your Python distribution:
pip install info.gianlucacosta.jardinero
Jardinero requires a linguistic module - for example, Cervantes, dedicated to the Spanish language:
pip install info.gianlucacosta.cervantes
Jardinero should preferably be run with Python's -OO and -m command-line arguments:
python -OO -m info.gianlucacosta.jardinero <linguistic module>
which, in the case of Cervantes, becomes:
python -OO -m info.gianlucacosta.jardinero info.gianlucacosta.cervantes
Then, you can just point any browser to http://localhost:7000/
Running in developer mode
By omitting the -OO (and even the -O) flag, Jardinero will start in developer mode - which enables additional aspects:
Flask running with file watching enabled
More fine-grained logging
HTTP redirection to the frontend development server
Python's __debug__ global variable set to true - for example, in this case, Cervantes downloads from localhost and not from Wikcionario's official website
For simplicity, Jardinero's TOML project includes auxiliary scripts:
Install the frontend as an NPM package:
poetry run poe install-frontend
After that, to start the frontend server during development, you can run:
poetry run poe start-frontend
Alternatively, for better debugging introspection, you can always run yarn start on the related project - to start Webpack's dev server
Python's static HTTP server, serving files from your $HOME/Downloads directory:
poetry run poe start-static
The above command lines can be further simplified if you add the following alias to your shell configuration - especially .profile for Bash:
alias poe='poetry run poe'
Once the above commands have been issued, you can just start Jardinero in development mode:
python -m info.gianlucacosta.jardinero <linguistic module>
and finally open your browser to the usual address - http://localhost:7000/
Jardinero is designed to be extensible! I created it to explore the nuances of the Spanish language, but it can support arbitrary combinations of parameters:
source wiki URL - provided it points to a BZ2-compressed file
term-extraction algorithm from each wiki page
SQL schema in the SQLite db
It is definitely up to your needs and creativity! 😊
Your linguistic module can be just a Python module (or a package) - within the current Python module search path - containing these functions:
get_wiki_url: a () -> str function returning the URL of a BZ2-compressed XML wiki file, which in turn should have the format described in WikiPrism documentation
extract_terms: a (Page) -> list[TTerm] function, extracting a list of terms from a given wiki page
create_sqlite_dictionary: a (Connection) => SqliteDictionary[TTerm] function creating an instance of a WikiPrism SqliteDictionary from the given SQLite connection. In particular, it is the Dictionary that actually responds to queries, so you might want to design your own DSL via a custom subclass.
The exact meaning of TTerm depends on your linguistic model: to explore a real-world example, please refer to Cervantes - my library dedicated to the analysis of the Spanish language.
Jardinero's core point is the web UI for creating and querying custom dictionaries, as well as its extensible engine.
Of course, there are limitations: if you need advanced features like pagination, charts, and even more analysis tools, you can still run Jardinero to create your custom SQL db, that will be stored at:
Then, you can also use your favorite database explorer - such as the excellent, open source DB Browser for SQLite.
The making of Jardinero - Story of a software engineer who wanted to learn Spanish
Cervantes - Extract a compact Spanish dictionary from Wikcionario, with elegance
WikiPrism - Parse wiki pages and create dictionaries, fast, with Python
Eos-core - Type-checked, dependency-free utility library for modern Python
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for info.gianlucacosta.jardinero-1.1.3.tar.gz
Hashes for info.gianlucacosta.jardinero-1.1.3-py3-none-any.whl