Skip to main content

Simple Wiktionary scraper

Project description

pyktionary

Simple Wiktionary scraper. Get information from words in Wiktionary.

The module is at an early stage, be advised that:

  • Only french Wiktionary is supported.
  • The following sections are not scraped:
    • Prononciation
    • Anagrammes
    • Voir aussi
    • Références
    • Forme de verbe
  • Any section not matching Étymologie is scraped as Définition.

What pyktionary is

A scraper that gets data on words from Wiktionary. Sections of a word are scraped as raw HTML into a dict, see Example.

What pyktionary is not

An interface to make changes to Wiktionary. You can NOT send data to Wiktionary with this module.

What's next ?

This module is at a very early stage. It only cover my specific use case, which is scraping a word's etymology and definitions from french Wiktionary.

The module will improve over time. Priorities are for the following features and fixes:

  • Scrap all sections from a word.
  • Support wiktionaries from other languages.

You can read the TODO for more stuff to do.

Usage

from pyktionary import Wiktionary

# ...

wik = Wiktionary()
word = wik.word("oui")

# ...

Example

With word oui:

The following code:

    from pyktionary import Wiktionary
    wik = Wiktionary()
    word = wik.word("oui")
    pprint.pprint(word, compact=True)

output:

{   
    'Définition': '<ol><li>Réponsede<i><ahref="https://fr.wiktionary.org/wiki/oui#fr-interj"title="oui">oui</a></i>.Votepour.<strong>Noted’usage:</strong>L’<ahref="https://fr.wiktionary.org/wiki/article"title="article">article</a>définines’<ahref="https://fr.wiktionary.org/wiki/%C3%A9lider"title="élider">élide</a>pasdevantcemot.<ul><li><i>Uneballade,uneballade!s’écrial’ermite,celavautmieuxquetouslesocetles<b>oui</b>deFrance.</i><spanclass="sources"><spanclass="tiret">—</span>(<aclass="extiw"href="https://fr.wikipedia.org/wiki/Walter_Scott"title="w:WalterScott">Walter<spanclass="petites_capitales"style="font-variant:small-caps">Scott</span></a>,<i><aclass="extiw"href="https://fr.wikipedia.org/wiki/Ivanho%C3%A9"title="w:Ivanhoé">Ivanhoé</a></i>,traduitdel’anglaispar<aclass="extiw"href="https://fr.wikipedia.org/wiki/Alexandre_Dumas"title="w:AlexandreDumas">Alexandre<spanclass="petites_capitales"style="font-variant:small-caps">Dumas</span></a>,<aclass="extiw"href="https://fr.wikisource.org/wiki/Ivanho%C3%A9_(Scott_-_Dumas)"title="s:Ivanhoé(Scott-Dumas)">1820</a>)</span></li><li><i>Le<b>oui</b>etlenon.</i></li><li><i>Iladitce<b>oui</b>-làdeboncœur.</i></li><li><i>Ilnefautpastantdediscours,onnevousdemandequ’un<b>oui</b>ouunnon.Ditesunbon<b>oui</b>.</i></li></ul></li></ol>',
    'Étymologie': '<dl><dd><spanclass="date"><i>(<spanclass="texte">1380</span>)</i></span>Del’ancienfrançais<i><spanclass="lang-fro"lang="fro"><ahref="https://fr.wiktionary.org/wiki/o%C3%AFl#fro"title="oïl">oïl</a></span></i><spanclass="date"><i>(<spanclass="texte">1080</span>)</i></span>,formecomposéede<i>o</i>«cela»<spanclass="date"><i>(<spanclass="texte">842</span>)</i></span>,ausensde«oui»(àcomparerde<i><ahref="https://fr.wiktionary.org/wiki/%C3%B2c"title="òc">òc</a></i>«oui»en<ahref="https://fr.wiktionary.org/wiki/occitan"title="occitan">occitan</a>),renforcéparlepronompersonnel<i><ahref="https://fr.wiktionary.org/wiki/il"title="il">il</a></i>(ontrouveaussi<i>o-je</i>,<i>o-tu</i>,<i>onos</i>,<i>ovos</i>).<spanid="ref-1"><small></small><sup><ahref="#reference-1">[1]</a></sup></span><spanid="ref-2"><small></small><sup><ahref="#reference-2">[2]</a></sup></span>Lesmots«oui»et«òc»sontdescalquesceltiques<supclass="reference"id="cite_ref-1"><ahref="#cite_note-1">[1]</a></sup></dd></dl>'
}

Licence

This module is licenced under GNU GPL v3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyktionary-0.5a0.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

pyktionary-0.5a0-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page