Skip to main content

Simple Wiktionary scraper

Project description

pyktionary

Simple Wiktionary scraper. Get information from words in Wiktionary.

The module is at an early stage, be advised that:

  • Only french Wiktionary is supported.
  • The following sections are not scraped:
    • Prononciation
    • Anagrammes
    • Voir aussi
    • Références
    • Forme de verbe
  • Any section not matching Étymologie is scraped as Définition.

What pyktionary is

A scraper that gets data on words from Wiktionary. Sections of a word are scraped as raw HTML into a dict, see Example.

What pyktionary is not

An interface to make changes to Wiktionary. You can NOT send data to Wiktionary with this module.

What's next ?

This module is at a very early stage. It only cover my specific use case, which is scraping a word's etymology and definitions from french Wiktionary.

The module will improve over time. Priorities are for the following features and fixes:

  • Scrap all sections from a word.
  • Support wiktionaries from other languages.

You can read the TODO for more stuff to do.

Usage

from pyktionary import Wiktionary

# ...

wik = Wiktionary()
word = wik.word("oui")

# ...

Example

With word oui:

The following code:

    from pyktionary import Wiktionary
    wik = Wiktionary()
    word = wik.word("oui")
    pprint.pprint(word, compact=True)

output:

{   
    'Définition': '<ol><li>Réponsede<i><ahref="https://fr.wiktionary.org/wiki/oui#fr-interj"title="oui">oui</a></i>.Votepour.<strong>Noted’usage:</strong>L’<ahref="https://fr.wiktionary.org/wiki/article"title="article">article</a>définines’<ahref="https://fr.wiktionary.org/wiki/%C3%A9lider"title="élider">élide</a>pasdevantcemot.<ul><li><i>Uneballade,uneballade!s’écrial’ermite,celavautmieuxquetouslesocetles<b>oui</b>deFrance.</i><spanclass="sources"><spanclass="tiret">—</span>(<aclass="extiw"href="https://fr.wikipedia.org/wiki/Walter_Scott"title="w:WalterScott">Walter<spanclass="petites_capitales"style="font-variant:small-caps">Scott</span></a>,<i><aclass="extiw"href="https://fr.wikipedia.org/wiki/Ivanho%C3%A9"title="w:Ivanhoé">Ivanhoé</a></i>,traduitdel’anglaispar<aclass="extiw"href="https://fr.wikipedia.org/wiki/Alexandre_Dumas"title="w:AlexandreDumas">Alexandre<spanclass="petites_capitales"style="font-variant:small-caps">Dumas</span></a>,<aclass="extiw"href="https://fr.wikisource.org/wiki/Ivanho%C3%A9_(Scott_-_Dumas)"title="s:Ivanhoé(Scott-Dumas)">1820</a>)</span></li><li><i>Le<b>oui</b>etlenon.</i></li><li><i>Iladitce<b>oui</b>-làdeboncœur.</i></li><li><i>Ilnefautpastantdediscours,onnevousdemandequ’un<b>oui</b>ouunnon.Ditesunbon<b>oui</b>.</i></li></ul></li></ol>',
    'Étymologie': '<dl><dd><spanclass="date"><i>(<spanclass="texte">1380</span>)</i></span>Del’ancienfrançais<i><spanclass="lang-fro"lang="fro"><ahref="https://fr.wiktionary.org/wiki/o%C3%AFl#fro"title="oïl">oïl</a></span></i><spanclass="date"><i>(<spanclass="texte">1080</span>)</i></span>,formecomposéede<i>o</i>«cela»<spanclass="date"><i>(<spanclass="texte">842</span>)</i></span>,ausensde«oui»(àcomparerde<i><ahref="https://fr.wiktionary.org/wiki/%C3%B2c"title="òc">òc</a></i>«oui»en<ahref="https://fr.wiktionary.org/wiki/occitan"title="occitan">occitan</a>),renforcéparlepronompersonnel<i><ahref="https://fr.wiktionary.org/wiki/il"title="il">il</a></i>(ontrouveaussi<i>o-je</i>,<i>o-tu</i>,<i>onos</i>,<i>ovos</i>).<spanid="ref-1"><small></small><sup><ahref="#reference-1">[1]</a></sup></span><spanid="ref-2"><small></small><sup><ahref="#reference-2">[2]</a></sup></span>Lesmots«oui»et«òc»sontdescalquesceltiques<supclass="reference"id="cite_ref-1"><ahref="#cite_note-1">[1]</a></sup></dd></dl>'
}

Licence

This module is licenced under GNU GPL v3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyktionary-0.5a0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyktionary-0.5a0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file pyktionary-0.5a0.tar.gz.

File metadata

  • Download URL: pyktionary-0.5a0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for pyktionary-0.5a0.tar.gz
Algorithm Hash digest
SHA256 eae4841ed224fce9418d724e946bf23d43b8dd1b6dad0b104fa4c495c61cebce
MD5 a505ef3e72dc403a3cf69c6bb4529904
BLAKE2b-256 e010f1356acdc81afba70f276681dfca639cbf8b114fa1d588ef7e3b95732dab

See more details on using hashes here.

File details

Details for the file pyktionary-0.5a0-py3-none-any.whl.

File metadata

  • Download URL: pyktionary-0.5a0-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for pyktionary-0.5a0-py3-none-any.whl
Algorithm Hash digest
SHA256 0bc3def2c33254f3b6408cf27e125a284ef48b2a1525cc7c6e33af7b8f35e135
MD5 0b173a488c4847ab3df0d1f59427d18f
BLAKE2b-256 69d265c8896a83a85528b8717907da6215c3c7558b0fb408eac6c97e55551671

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page