Skip to main content

Extensible web application for exploring natural languages

Project description

Jardinero

Extensible web application for exploring natural languages

Main page

Introduction

Natural languages are as sublime as exquisite flowers in a garden - and from such a naturalistic simile stems the name of this web application: Jardinero, meaning gardener.

I definitely needed a tool to perform morphological analysis over the Spanish language - that is, I wanted to find an answer to questions like:

Why some Spanish words end with -tad, whereas others end with -dad? What are the differences between them, in terms of both morphology and cardinality?

To solve this mystery - and several more - I decided to create Jardinero, a web application extracting my compact SQLite Spanish dictionary from Wikcionario, ready for custom SQL queries.

While developing the project, I felt it would be nice to extend the approach to any language, thus creating the whole open source architecture consisting of:

  • Eos-core - type-checked, dependency-free utility library for modern Python

  • WikiPrism - library for parsing wiki pages and creating dictionaries

  • Cervantes - WikiPrism-based library extracting a compact Spanish dictionary from Wikcionario

  • Jardinero: hybrid Python/TypeScript web application, with a Flask backend and a React frontend communicating via websockets

As a core aspect, the architecture can be easily extended by creating Python modules and packages named linguistic modules.

Main features

Jardinero's user interface enables users to:

  • create a SQLite dictionary from a wiki file - whose URL depends on the current linguistic module

  • perform queries - in SQL or even in a custom DSL - upon the internal dictionary

  • re-create the dictionary, especially when the data source gets frequent updates

Pipeline

Installation

You can install Jardinero just like any other PyPI package for your Python distribution:

pip install info.gianlucacosta.jardinero

Running Jardinero

  1. Jardinero requires a linguistic module - for example, Cervantes, dedicated to the Spanish language:

    pip install info.gianlucacosta.cervantes
    
  2. Jardinero should preferably be run with Python's -OO and -m command-line arguments:

    python -OO -m info.gianlucacosta.jardinero <linguistic module>
    

    which, in the case of Cervantes, becomes:

    python -OO -m info.gianlucacosta.jardinero info.gianlucacosta.cervantes
    
  3. Then, you can just point any browser to http://localhost:7000/

Running in developer mode

By omitting the -OO (and even the -O) flag, Jardinero will start in developer mode - which enables additional aspects:

  • Flask running with file watching enabled

  • More fine-grained logging

  • HTTP redirection to the Webpack development server

  • Python's __debug__ global variable set to true - for example, in this case, Cervantes downloads from localhost and not from Wikcionario's official website

For simplicity, Jardinero's TOML project includes auxiliary scripts:

  • Webpack's frontend development server, in watch mode:

    poetry run poe setup-frontend
    
    poetry run poe start-frontend
    
  • Python's static HTTP server, serving files from your $HOME/Downloads directory:

    poetry run poe start-static
    

The above command lines can be further simplified if you add the following alias to your shell configuration - especially .profile for Bash:

alias poe='poetry run poe'

Once the above commands have been issued, you can just start Jardinero in development mode:

python -m info.gianlucacosta.jardinero <linguistic module>

and finally open your browser to the usual address - http://localhost:7000/

Extending Jardinero

Jardinero is designed to be extensible! I created it to explore the nuances of the Spanish language, but it can support arbitrary combinations of parameters:

  • source wiki URL - provided it points to a BZ2-compressed file

  • term-extraction algorithm from each wiki page

  • SQL schema in the SQLite db

It is definitely up to your needs and creativity! 😊

Your linguistic module can be just a Python module (or a package) - within the current Python module search path - containing these functions:

  • get_wiki_url: a () -> str function returning the URL of a BZ2-compressed XML wiki file, which in turn should have the format described in WikiPrism documentation

  • extract_terms: a (Page) -> list[TTerm] function, extracting a list of terms from a given wiki page

  • create_sqlite_dictionary: a (Connection) => SqliteDictionary[TTerm] function creating an instance of a WikiPrism SqliteDictionary from the given SQLite connection. In particular, it is the Dictionary that actually responds to queries, so you might want to design your own DSL via a custom subclass.

The exact meaning of TTerm depends on your linguistic model: to explore a real-world example, please refer to Cervantes - my library dedicated to the analysis of the Spanish language.

Final thoughts

Jardinero's core point is the web UI for creating and querying custom dictionaries, as well as its extensible engine.

Of course, there are limitations: if you need advanced features like pagination, charts, and even more analysis tools, you can still run Jardinero to create your custom SQL db, that will be stored at:

$HOME/.jardinero/<module name>/dictionary.db

Then, you can also use your favorite database explorer - such as the excellent, open source DB Browser for SQLite.

Further references

Cervantes - Extract a compact Spanish dictionary from Wikcionario, with elegance

WikiPrism - Parse wiki pages and create dictionaries, fast, with Python

Eos-core - Type-checked, dependency-free utility library for modern Python

Special thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

info.gianlucacosta.jardinero-1.0.0.tar.gz (79.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

info.gianlucacosta.jardinero-1.0.0-py3-none-any.whl (82.5 kB view details)

Uploaded Python 3

File details

Details for the file info.gianlucacosta.jardinero-1.0.0.tar.gz.

File metadata

  • Download URL: info.gianlucacosta.jardinero-1.0.0.tar.gz
  • Upload date:
  • Size: 79.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.4 Linux/5.13.0-1021-azure

File hashes

Hashes for info.gianlucacosta.jardinero-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f889dced88f1cc9a476e67a88f8f60d12f60fbc7b48b71b769f383885b4ff83e
MD5 60d4929f5b07b75a3fc6a26f23ba54a4
BLAKE2b-256 418593725707faf7b9d5ec778cd861567b988b820202e125264e2f16ae6d49aa

See more details on using hashes here.

File details

Details for the file info.gianlucacosta.jardinero-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for info.gianlucacosta.jardinero-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f861d0bf20b92d26ca59278d9c643046855a40e69c9d2703b6e3509771f00a2
MD5 bc2c516974b2f66b48400aa2f638d217
BLAKE2b-256 3bde89cbf64a5aee8d53c5290d0c209603d6ecb5e1f3385c5e00bfe735b20826

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page