Skip to main content

A wrapper Python library for working with the DODa dataset

Project description

PyDODa: A Wrapper Python Library for The Darija Open Dataset

"This software includes data sourced from Darija Open Dataset." GitHub

What is DODa ?

From the DODa's Official GitHub repository :

Darija Open Dataset (DODa) is an open-source project for the Moroccan dialect. With more than 21,000 entries DODa is arguably the largest open-source collaborative project for Darija <=> English translation built for Natural Language Processing purposes.

What is PyDODa ?

Pydoda is a comprehensive Python library that serves as a convenient wrapper for the DODa dataset, offering seamless access and powerful analysis capabilities. The DODa dataset is a valuable linguistic resource that contains various categories of words, phrases, and sentences in Darija (Moroccan Arabic).

Pydoda simplifies the process of working with the DODa dataset, allowing researchers, developers, and language enthusiasts to explore and leverage the rich linguistic content it offers. The library provides an intuitive and efficient interface to access different categories within the dataset, retrieve spellings, translations, and perform various analyses.

By integrating Pydoda into your Python workflow, you gain access to a wide range of functionalities to extract insights from the DODa dataset. Whether you need to analyze specific semantic or syntactic categories, retrieve translations, explore variations in spellings, or investigate linguistic patterns, Pydoda empowers you to unlock the potential of the DODa dataset in an effortless manner.

Installation

Pydoda can be easily installed using pip, the Python package manager:

$ pip3 install pydoda

How It Works

Pydoda provides a simple and intuitive way to access the linguistic content of the DODa dataset. You can use the Pydoda class to retrieve various categories and information from the dataset.

Here's an example of how to use Pydoda:

from pydoda import Pydoda

# Create an instance of Pydoda
pydoda = Pydoda()

# Retrieve all available categories
categories = pydoda.all()

# Print the categories in a user-friendly format
for category_type, category_list in categories.items():
    print(f"{category_type.capitalize()} Categories:")
    for category in category_list:
        print(f"- {category}")
    print()

Output:

Semantic Categories:
- malenames
- emotions
- clothes
- environment
- education
- food
- numbers
- family
- sport
- economy
- time
- femalenames
- animals
- art
- colors
- health
- humanbody
- professions
- places
- plants
- religion

Syntactic Categories:
- verbs
- adverbs
- verb-to-noun
- pronouns
- prepositions
- conjug_past
- masculine_feminine_plural
- adjectives
- conjug_present
- nouns
- imperatives
- (in)definite

X-tra Categories:
- idioms
- weird
- proverbs
- utils

Sentences Categories:
- sentences

You can use the Category class to retrieve specific linguistic information from a chosen category.

Here's an example of how to use Category:

from pydoda import Category

# Create an instance of Category
my_category = Category('semantic', 'animals')

# Get the Darija translation of a word
darija_translation = my_category.get_darija_translation('dog')
print(darija_translation)
# Output: klb

# Get the English translation of a word
english_translation = my_category.get_english_translation('mch')
print(english_translation)
# Output: 'cat'

Documentation

For a detailed documentation on the Pydoda library, please refer to the official Pydoda documentation.

Clone Repository

To clone the Pydoda repository, use the following command:

git clone https://github.com/saad-out/pydoda.git --recurse-submodule

The --recurse-submodule flag is used to ensure that you also clone the submodules associated with the repository. In this case, the Pydoda library has a submodule named dataset (The Darija Open Dataset), which contains the linguistic dataset used by Pydoda. Cloning the repository with the --recurse-submodule flag ensures that you have access to both the Pydoda library code and the necessary dataset.

License

License: MIT

The Pydoda Library is released under the MIT License. This license allows you to use, modify, and distribute the code for both commercial and non-commercial purposes. It grants you the freedom to adapt the library to your specific needs while providing the flexibility to incorporate it into your projects without restrictions.

For more details, please refer to the LICENSE file in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydoda-1.2.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

pydoda-1.2.0-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file pydoda-1.2.0.tar.gz.

File metadata

  • Download URL: pydoda-1.2.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for pydoda-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c51a046bc86c9d2c16d3000bcdcc7f266c06a2c20006887a0a4be8fdeb00b7fd
MD5 3608adc887f9c631641156e7ceb8af30
BLAKE2b-256 f18bf66ae9e14830ca5bcbb1f9f73250699008ff2381e15e5eb8d3208d4e1bec

See more details on using hashes here.

Provenance

File details

Details for the file pydoda-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pydoda-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for pydoda-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 abc5411690328a8d59e7893cec122cc1d167c31f7bfc46f7b1902fc22f39e287
MD5 e93edcef3893b195c7da97e803e414b0
BLAKE2b-256 795e777eda36d934b5e6d5934f2f9b126ce4f723e8432cd36832e93597219eac

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page