Skip to main content

A wrapper Python library for working with the DODa dataset

Project description

PyDODa: A Wrapper Python Library for The Darija Open Dataset

"This software includes data sourced from Darija Open Dataset." GitHub

What is DODa ?

From the DODa's Official GitHub repository :

Darija Open Dataset (DODa) is an open-source project for the Moroccan dialect. With more than 21,000 entries DODa is arguably the largest open-source collaborative project for Darija <=> English translation built for Natural Language Processing purposes.

What is PyDODa ?

Pydoda is a comprehensive Python library that serves as a convenient wrapper for the DODa dataset, offering seamless access and powerful analysis capabilities. The DODa dataset is a valuable linguistic resource that contains various categories of words, phrases, and sentences in Darija (Moroccan Arabic).

Pydoda simplifies the process of working with the DODa dataset, allowing researchers, developers, and language enthusiasts to explore and leverage the rich linguistic content it offers. The library provides an intuitive and efficient interface to access different categories within the dataset, retrieve spellings, translations, and perform various analyses.

By integrating Pydoda into your Python workflow, you gain access to a wide range of functionalities to extract insights from the DODa dataset. Whether you need to analyze specific semantic or syntactic categories, retrieve translations, explore variations in spellings, or investigate linguistic patterns, Pydoda empowers you to unlock the potential of the DODa dataset in an effortless manner.

Installation

Pydoda can be easily installed using pip, the Python package manager:

$ pip3 install pydoda

How It Works

Pydoda provides a simple and intuitive way to access the linguistic content of the DODa dataset. You can use the Pydoda class to retrieve various categories and information from the dataset.

Here's an example of how to use Pydoda:

from pydoda import Pydoda

# Create an instance of Pydoda
pydoda = Pydoda()

# Retrieve all available categories
categories = pydoda.all()

# Print the categories in a user-friendly format
for category_type, category_list in categories.items():
    print(f"{category_type.capitalize()} Categories:")
    for category in category_list:
        print(f"- {category}")
    print()

Output:

Semantic Categories:
- malenames
- emotions
- clothes
- environment
- education
- food
- numbers
- family
- sport
- economy
- time
- femalenames
- animals
- art
- colors
- health
- humanbody
- professions
- places
- plants
- religion

Syntactic Categories:
- verbs
- adverbs
- verb-to-noun
- pronouns
- prepositions
- conjug_past
- masculine_feminine_plural
- adjectives
- conjug_present
- nouns
- imperatives
- (in)definite

X-tra Categories:
- idioms
- weird
- proverbs
- utils
- shorts

Sentences Categories:
- sentences

# etc...

You can use the Category class to retrieve specific linguistic information from a chosen category.

Here's an example of how to use Category:

from pydoda import Category

# Create an instance of Category
my_category = Category('semantic', 'animals')

# Get the Darija translation of a word
darija_translation = my_category.get_darija_translation('dog')
print(darija_translation)
# Output: klb

# Get the English translation of a word
english_translation = my_category.get_english_translation('mch')
print(english_translation)
# Output: 'cat'

Documentation

For a detailed documentation on the Pydoda library, please refer to the official Pydoda documentation.

Clone Repository

To clone the Pydoda repository, use the following command:

git clone https://github.com/saad-out/pydoda.git --recurse-submodule

The --recurse-submodule flag is used to ensure that you also clone the submodules associated with the repository. In this case, the Pydoda library has a submodule named dataset (The Darija Open Dataset), which contains the linguistic dataset used by Pydoda. Cloning the repository with the --recurse-submodule flag ensures that you have access to both the Pydoda library code and the necessary dataset.

License

License: MIT

The Pydoda Library is released under the MIT License. This license allows you to use, modify, and distribute the code for both commercial and non-commercial purposes. It grants you the freedom to adapt the library to your specific needs while providing the flexibility to incorporate it into your projects without restrictions.

For more details, please refer to the LICENSE file in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydoda-1.2.1.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

pydoda-1.2.1-py3-none-any.whl (4.1 MB view details)

Uploaded Python 3

File details

Details for the file pydoda-1.2.1.tar.gz.

File metadata

  • Download URL: pydoda-1.2.1.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for pydoda-1.2.1.tar.gz
Algorithm Hash digest
SHA256 1026f272d2fb9606ffc812268633806faecc28da046db25eb78be9ff6db7cdda
MD5 3cc6896a38b75d5db8b7d8faa803a32f
BLAKE2b-256 596ed38bb055c885d214a17c6ac00aea706957d916d67a45992708be34e94aff

See more details on using hashes here.

Provenance

File details

Details for the file pydoda-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: pydoda-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for pydoda-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0aaef743e4b46f93a0d2d03d1eea6c90dc55eba50a5738cdf1f011c167f692a9
MD5 cc1d4b1b7355976bbc18252cdea0f57a
BLAKE2b-256 abcc51cdc575d134dddb1b03663e167f446099e8dc86c9ea811ef661653e9b7b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page