Skip to main content

LangString Python Library

Project description

Project DOI Project Status - Active GitHub - Release Date - PublishedAt GitHub - Last Commit - Branch PyPI - Project PyPI - Downloads Language - Top Language - Version CodeFactor Grade OpenSSF Scorecard Code style: black License - GitHub pre-commit pre-commit.ci status Website GitHub Workflow Status (with event) OpenSSF Best Practices

LangString Python Library

LangString is a Python library designed to handle multilingual text data with precision and flexibility. Although the need for robust management of multilingual content is critical, existing solutions often lack the necessary features to manage language-tagged strings, sets of strings, and collections of multilingual strings effectively. LangString addresses this gap by providing classes and utilities that enable the creation, manipulation, and validation of multilingual text data consistently and accurately. Inspired by RDFS's langstrings, LangString integrates seamlessly into Python applications, offering familiar methods that mimic those of regular Python types, making it intuitive for developers to adopt and use.

📦 PyPI Package: The library is conveniently available as a PyPI package, allowing users to easily import it into other Python projects.

📚 Documentation: For detailed documentation and code examples, please refer to the library's docstring-generated documentation.

Contents

Installation and Use

Basic Installation

Install with:

pip install langstring

Dependencies

The LangString Python Library is designed with simplicity and ease of use in mind. To achieve this, we have minimized external dependencies.

The LangString Library depends only on the langcodes package, particularly for validating language tags when the ENSURE_VALID_LANG flag is enabled. This dependency is crucial for ensuring that language tags used in LangString and MultiLangString instances are valid and conform to international standards, thereby maintaining the integrity and reliability of multilingual text processing.

This project can be set up using either Poetry or requirements.txt. Both are kept in sync to ensure consistency in dependencies.

Installation of Extra Dependencies

Installation of Dev Dependencies

Using Poetry

Poetry is used for easy management of dependencies and packaging. To install the dependencies with Poetry, first install Poetry if you haven't already, and then run:

poetry install

This will install all the dependencies as specified in pyproject.toml.

Using requirements.txt

If you prefer not to use Poetry, a requirements.txt file is also provided. You can install the dependencies using pip:

pip install -r requirements.txt

This is a straightforward way to set up the project if you are accustomed to using pip and traditional requirements files.

Usage

After installation, you can use the LangString and MultiLangString classes in your project. Simply import the classes and start encapsulating strings with their language tags.

from langstring import LangString, MultiLangString, Controller, LangStringFlag, MultiLangStringFlag

Main Elements

LangStrings

The LangString class is a fundamental component of the LangString Library, designed to encapsulate a single string along with its associated language information. It is primarily used in scenarios where the language context of a text string is crucial, such as in multilingual applications, content management systems, or any software that deals with language-specific data. The class provides a structured way to manage text strings, ensuring that each piece of text is correctly associated with its respective language.

In the LangString class, the string representation format varies based on the presence of a language tag. When a language tag is provided, the format is text. Without a language tag, it is formatted as "text"@lang, where lang is the language code.

SetLangStrings

TODO

MultiLangStrings

The MultiLangString class is a key component of the LangString Library, designed to manage and manipulate text strings across multiple languages. This class is particularly useful in applications that require handling of text in a multilingual context, such as websites, applications with internationalization support, and data processing tools that deal with multilingual data. The primary purpose of MultiLangString is to store, retrieve, and manipulate text entries in various languages, offering a flexible and efficient way to handle multilingual content.

Controller and Flags

The Control and Flags system in the LangString Library plays a pivotal role in managing and configuring the behavior of LangString and MultiLangString instances.

This system operates at a global, class-level context, meaning that the flags and controls applied have a uniform effect across all instances of these classes. In other words, when a flag is set or reset using the control classes, it impacts every instance of LangString and MultiLangString throughout the application. This ensures consistent behavior and validation rules across all instances, as individual instances cannot have differing flag values.

In the following subsections, we will delve into the specifics of the available flags and the control methods. The flags define key aspects of how LangString and MultiLangString instances handle multilingual text, including validation rules and representation formats. Understanding these flags is crucial for effectively utilizing the library in various scenarios, especially those involving multilingual content.

The control methods, shared between Controller and MultiLangStringControl, provide the mechanisms to set, retrieve, and reset these flags. These methods ensure that you can dynamically configure the behavior of the library to suit your application's needs. We will explore each method in detail, providing insights into their usage and impact on the library's functionality.

The LangString and MultiLangString classes use a set of flags to control various aspects of their behavior. These flags are managed by Controller and MultiLangStringControl respectively. The flags provide a flexible way to customize the behavior of LangString and MultiLangString classes according to the specific needs of your application. By adjusting these flags, you can enforce different levels of validation and control over the language data being processed. The available flags and their effects are as follows.

The Control classes, namely Controller and MultiLangStringControl, act as static managers for the flags. They provide methods to set, retrieve, and reset the states of these flags, ensuring consistent behavior across all instances of LangString and MultiLangString.

Converter

Code Testing

The code provided has undergone rigorous testing to ensure its reliability and correctness. The tests can be found in the 'tests' directory of the project. To run the tests, navigate to the project root directory and execute the following command:

langstring> pytest .\tests

How to Contribute

Reporting Issues

Code Contributions

  1. Fork the project repository and create a new feature branch for your work: git checkout -b feature/YourFeatureName.
  2. Make and commit your changes with descriptive commit messages.
  3. Push your work back up to your fork: git push origin feature/YourFeatureName.
  4. Submit a pull request to propose merging your feature branch into the main project repository.

Test Contributions

  • Enhance the project's reliability by adding new tests or improving existing ones.

General Guidelines

  • Ensure your code follows our coding standards.
  • Update the documentation as necessary.
  • Make sure your contributions do not introduce new issues.

We appreciate your time and expertise in contributing to this project!

Related Work and Differences

The LangString Library offers unique functionalities for handling multilingual text in Python applications. While there are several libraries and tools available for internationalization, localization, and language processing, they differ from the LangString Library in scope and functionality. Below is an overview of related work and how they compare to the LangString Library:

  • Babel

    • https://pypi.org/project/Babel/
    • Babel is a Python library for internationalization and localization. It primarily focuses on formatting dates, numbers, and currency values for different locales.
    • Difference: Unlike Babel, the LangString Library specifically manages multilingual text strings, providing a more direct approach to handling language-specific text data.
  • gettext

    • https://pypi.org/project/python-gettext/
    • gettext is a GNU system used for internationalizing applications. It allows for translating fixed strings in different languages using message catalogs.
    • Difference: The LangString Library, in contrast, is designed for dynamic management of multilingual content, not just for translation of static strings.
  • langcodes

    • https://pypi.org/project/langcodes/
    • langcodes provides tools for parsing and understanding language tags.
    • Difference: While langcodes is useful for handling language codes, the LangString Library extends beyond this by managing actual multilingual text strings associated with these codes.
  • Polyglot

    • https://pypi.org/project/polyglot/
    • Polyglot is a natural language pipeline that supports multiple languages for various NLP tasks.
    • Difference: Polyglot focuses on language processing rather than the structured management of multilingual text, which is the core functionality of the LangString Library.
  • CLD3

    • https://pypi.org/project/gcld3/
    • Google's CLD3 is a model for language identification.
    • Difference: CLD3 is specialized in detecting the language of a text, whereas the LangString Library is about storing and manipulating text in multiple languages.
  • spaCy

    • https://pypi.org/project/spacy/
    • spaCy is a comprehensive NLP library that supports multiple languages.
    • Difference: spaCy is geared towards analyzing text, not managing it. The LangString Library, on the other hand, is designed for the structured handling and storage of multilingual text.

In summary, while these related tools and libraries offer valuable functionalities for internationalization, localization, and language processing, the LangString Library stands out for its specific focus on managing and manipulating multilingual text strings in a structured and efficient manner.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Author

This project is an initiative of the Semantics, Cybersecurity & Services (SCS) Group at the University of Twente, The Netherlands. The main developer is:

Feel free to reach out using the provided links. For inquiries, contributions, or to report any issues, you can open a new issue on this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langstring-3.0.0.dev0.tar.gz (57.3 kB view hashes)

Uploaded Source

Built Distribution

langstring-3.0.0.dev0-py3-none-any.whl (60.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page