Skip to main content

Wiktionary Data Preparer

Project description

https://img.shields.io/pypi/v/wdp?style=for-the-badge https://img.shields.io/pypi/dm/wdp?style=for-the-badge https://img.shields.io/pypi/l/wdp?style=for-the-badge

Introduction

wdp (Wiktionary Data Preparer) is a small Python library that can help you get your language data onto Wiktionary. Formatting Wiktionary entries perfectly can be hard, and it’s wdp’s goal to take care of the tricky stuff for you.

Example

Using the Word API, enter your data:

from wdp import Word

# use the Word class to represent our words
apple = Word("apple")
apple.add_pronunciation("/ˈæp.əl/", notation="IPA")
apple.add_definition("A common, round fruit", "Noun")
apple.add_definition("A tree of the genus Malus", "Noun")
apple.set_etymology("Old English æppel < Proto-Germanic *ap(a)laz < PIE *ab(e)l-")

pear = Word("pear")
# ...

# put all our words in a list
wdp_words = [apple, pear, ...]

Use the format_entries function with your list of Word objects to produce Wiktionary markup:

from wdp import format_entries

# Generate Wiktionary markup from our entries
formatted_entries = format_entries(wdp_words, "en", "English")
# Produces an entry like the following:
"""
==English==

===Etymology===
Old English æppel < Proto-Germanic *ap(a)laz < PIE *ab(e)l-

===Noun===
{{head|en|noun}}

# A common, round fruit
# A tree of the genus Malus
"""

Perform the upload:

from wdp.upload import upload_formatted_entries
upload_formatted_entries(formatted_entries, "English")

Installation

(Note: wdp requires Python 3.6 or higher. If you do not have a Python installation, we recommend that you use Anaconda.)

pip install wdp

Usage

Prerequisites

To use wdp, you will need to have your data available in a machine-readable format. The format does not matter, but you will need to be able to read it and turn it into a list of Word objects.

Step 1: Build Word Objects

As in the example above, you will need to build a list of Word objects. A single Word object is defined by its canonical form. It is OK for two or more words to have the same form–this might happen when two words are homonyms, or when they have separate etymologies.

from wdp import Word
bank_1 = Word("bank")
bank_1.add_definition("A place where people keep their money", "Noun")

bank_2 = Word("bank")
bank_2.add_definition("The edges of a river", "Noun")

Methods of the Word class which begin with add_ can be invoked multiple times (because e.g. a word can have many definitions), but methods which begin with set_ should only be called once (because e.g. you should only have one etymological note).

Consult the Word class’s documentation for a complete description of its methods. Currently, the following methods are available:

  • add_definition

  • add_alternative_form

  • add_pronunciation

  • set_etymology

  • set_description

  • set_references

  • set_usage_notes

  • set_conjugation

  • set_declension

  • set_inflection

For more information on how to use these methods, see Wiktionary’s entry layout guidelines.

Step 2, option 2 (Advanced): Format and Upload Word Objects

Section under construction

First, you will need to create an account on Wiktionary.

Next, in your working directory, create a user-config.py file with the following contents:

family = "wiktionary"
mylang = "en"

usernames["wiktionary"]["en"] = u"Ldgessler"  # change to your username

console_encoding = "utf-8"

minthrottle = 0
maxthrottle = 1

In your main Python file, you can now use wdp.upload.upload_formatted_entries to perform your upload:

# load your list of Words
from wdp.upload import upload_formatted_entries
my_english_words = [...]
# or
from wdp import import_words
my_english_words = import_words('my_english_words.zip')

# format the list of Words into entries
# you will need a language code from here:
# https://en.wiktionary.org/wiki/Wiktionary:List_of_languages
from wdp import format_entries
lang_code = "en"
lang_name = "English"
formatted_entries = format_entries(my_english_words, lang_code, lang_name)

# use the page_prefix argument to upload the data to your personal pages
# first for debugging, e.g. User:Ldgessler/chafe
upload_formatted_entries(formatted_entries, lang_name, page_prefix="User:Ldgessler/")

# Once you are CERTAIN your data is correct, you may remove the page_prefix
# argument to perform the upload for real:
upload_formatted_entries(formatted_entries, lang_name)

FAQ

I don’t know Python. Can I still use WDP?

Not on your own, but please open an issue on our GitHub page explaining what your data looks like, and someone may be available to help you.

I have data in X format. Will WDP work with it?

Yes, WDP is agnostic as to the source format of your data.

In the future, we may add support for popular formats (like FLEx dictionary XML) to allow you to upload from them without writing any code. If there is a format you’d like us to support, please open an issue.

What should I do if my language doesn’t have a code?

A new one can easily be created, but you will need to consult with an expert. Contact Aryaman Arora (aa2190@georgetown.edu) or a Wiktionary admin.

Can I update my entries once they’re uploaded?

Not currently, but this is a feature we’d like to support if there’s demand for it. Please open an issue if you would like this functionality.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

wdp-0.0.6-py2.py3-none-any.whl (12.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page