Simple Wiktionary scraper
Project description
pyktionary
Simple Wiktionary scraper. Get information from words in Wiktionary.
The module is at an early stage, be advised that:
- Only french Wiktionary is supported.
- The following sections are not scraped:
- Prononciation
- Anagrammes
- Voir aussi
- Références
- Forme de verbe
- Any section not matching Étymologie is scraped as Définition.
What pyktionary is
A scraper that gets data on words from Wiktionary. Sections of a word are scraped as raw HTML into a dict, see Example.
What pyktionary is not
An interface to make changes to Wiktionary. You can NOT send data to Wiktionary with this module.
What's next ?
This module is at a very early stage. It only cover my specific use case, which is scraping a word's etymology and definitions from french Wiktionary.
The module will improve over time. Priorities are for the following features and fixes:
- Scrap all sections from a word.
- Support wiktionaries from other languages.
You can read the TODO for more stuff to do.
Usage
from pyktionary import Wiktionary
# ...
wik = Wiktionary()
word = wik.word("oui")
# ...
Example
With word oui:
The following code:
from pyktionary import Wiktionary
wik = Wiktionary()
word = wik.word("oui")
pprint.pprint(word, compact=True)
output:
{
'Définition': '<ol><li>Réponsede<i><ahref="https://fr.wiktionary.org/wiki/oui#fr-interj"title="oui">oui</a></i>.Votepour.<strong>Noted’usage:</strong>L’<ahref="https://fr.wiktionary.org/wiki/article"title="article">article</a>définines’<ahref="https://fr.wiktionary.org/wiki/%C3%A9lider"title="élider">élide</a>pasdevantcemot.<ul><li><i>Uneballade,uneballade!s’écrial’ermite,celavautmieuxquetouslesocetles<b>oui</b>deFrance.</i><spanclass="sources"><spanclass="tiret">—</span>(<aclass="extiw"href="https://fr.wikipedia.org/wiki/Walter_Scott"title="w:WalterScott">Walter<spanclass="petites_capitales"style="font-variant:small-caps">Scott</span></a>,<i><aclass="extiw"href="https://fr.wikipedia.org/wiki/Ivanho%C3%A9"title="w:Ivanhoé">Ivanhoé</a></i>,traduitdel’anglaispar<aclass="extiw"href="https://fr.wikipedia.org/wiki/Alexandre_Dumas"title="w:AlexandreDumas">Alexandre<spanclass="petites_capitales"style="font-variant:small-caps">Dumas</span></a>,<aclass="extiw"href="https://fr.wikisource.org/wiki/Ivanho%C3%A9_(Scott_-_Dumas)"title="s:Ivanhoé(Scott-Dumas)">1820</a>)</span></li><li><i>Le<b>oui</b>etlenon.</i></li><li><i>Iladitce<b>oui</b>-làdeboncœur.</i></li><li><i>Ilnefautpastantdediscours,onnevousdemandequ’un<b>oui</b>ouunnon.Ditesunbon<b>oui</b>.</i></li></ul></li></ol>',
'Étymologie': '<dl><dd><spanclass="date"><i>(<spanclass="texte">1380</span>)</i></span>Del’ancienfrançais<i><spanclass="lang-fro"lang="fro"><ahref="https://fr.wiktionary.org/wiki/o%C3%AFl#fro"title="oïl">oïl</a></span></i><spanclass="date"><i>(<spanclass="texte">1080</span>)</i></span>,formecomposéede<i>o</i>«cela»<spanclass="date"><i>(<spanclass="texte">842</span>)</i></span>,ausensde«oui»(àcomparerde<i><ahref="https://fr.wiktionary.org/wiki/%C3%B2c"title="òc">òc</a></i>«oui»en<ahref="https://fr.wiktionary.org/wiki/occitan"title="occitan">occitan</a>),renforcéparlepronompersonnel<i><ahref="https://fr.wiktionary.org/wiki/il"title="il">il</a></i>(ontrouveaussi<i>o-je</i>,<i>o-tu</i>,<i>onos</i>,<i>ovos</i>).<spanid="ref-1"><small></small><sup><ahref="#reference-1">[1]</a></sup></span><spanid="ref-2"><small></small><sup><ahref="#reference-2">[2]</a></sup></span>Lesmots«oui»et«òc»sontdescalquesceltiques<supclass="reference"id="cite_ref-1"><ahref="#cite_note-1">[1]</a></sup></dd></dl>'
}
Licence
This module is licenced under GNU GPL v3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyktionary-0.5a0.tar.gz.
File metadata
- Download URL: pyktionary-0.5a0.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eae4841ed224fce9418d724e946bf23d43b8dd1b6dad0b104fa4c495c61cebce
|
|
| MD5 |
a505ef3e72dc403a3cf69c6bb4529904
|
|
| BLAKE2b-256 |
e010f1356acdc81afba70f276681dfca639cbf8b114fa1d588ef7e3b95732dab
|
File details
Details for the file pyktionary-0.5a0-py3-none-any.whl.
File metadata
- Download URL: pyktionary-0.5a0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bc3def2c33254f3b6408cf27e125a284ef48b2a1525cc7c6e33af7b8f35e135
|
|
| MD5 |
0b173a488c4847ab3df0d1f59427d18f
|
|
| BLAKE2b-256 |
69d265c8896a83a85528b8717907da6215c3c7558b0fb408eac6c97e55551671
|