Skip to main content

Typed Python client for Infopédia — the Portuguese language dictionary (www.infopedia.pt/dicionarios/lingua-portuguesa)

Project description

pyinfopedia

Typed Python client for Infopédia — the European-Portuguese dictionary by Porto Editora.

Each word page is parsed into a typed Entry: headword, IPA pronunciation(s), syllabification, etymology, grammatical categories with numbered senses, set phrases, inflected forms, and the sidebar related-word lists (synonyms, rhymes, neighbours…).

Built for Portuguese NLP/lexicon work — it correctly separates heterophonic homographs (same spelling, different pronunciation per reading), which is what makes it useful for grapheme-to-phoneme and disambiguation tasks.

Install

pip install pyinfopedia            # or: uv pip install pyinfopedia
pip install pyinfopedia[stealth]   # + curl_cffi for Cloudflare bypass

Depends on unblock_requests for transport.

Quick start

import pyinfopedia

entry = pyinfopedia.get_word("casa")
print(entry.pronunciation)              # ˈkazɐ
print(entry.categories[0].pos)          # nome feminino
print(entry.categories[0].senses[0].definition)

for r in pyinfopedia.search("cas"):     # prefix autocomplete
    print(r.word, r.url)

Heterophonic homographs

Infopédia lists one entry block per pronunciation; pyinfopedia keeps them separate, tying each grammatical category (and its senses) to the reading it belongs to:

entry = pyinfopedia.get_word("sede")
for cat in entry.categories:
    print(cat.pronunciation, cat.pos, "->", cat.senses[0].definition)
# ˈsɛdɨ nome feminino -> lugar onde alguém se pode sentar ou fixar   (seat / HQ)
# ˈsedɨ nome feminino -> sensação causada pela necessidade de beber  (thirst)

The two readings carry disjoint senses — corte (cut ˈkɔɾtɨ / court ˈkoɾtɨ), molho (sauce ˈmoʎu / bundle ˈmɔʎu), forma (mould ˈfoɾmɐ / shape ˈfɔɾmɐ) all behave the same way. See examples/heterophones.py.

Transport / Cloudflare

All HTTP goes through Transport, a wrapper over unblock_requests.CloudflareSession. Pick a mode when the default is blocked:

from pyinfopedia import Infopedia
client = Infopedia(mode="curl_cffi")                                   # impersonate a browser
client = Infopedia(mode="flaresolverr",
                   flaresolverr_url="http://192.168.1.116:8191")        # FlareSolverr

Modes: requests · curl_cffi · flaresolverr · wayback.

Verbs

from pyinfopedia import get_verb
conj = get_verb("jogar")
print(conj.first_person_singular())     # jogo
print(conj.present_indicative())

Datasets

pyinfopedia.dataset exports JSONL/CSV for a word list — see examples/build_dataset.py.

Development

pytest -m "not live"            # offline parser/model tests (HTML fixtures)
PYINFOPEDIA_FLARESOLVERR=http://host:8191 pytest -m live    # hit the live site

Apache-2.0 · JarbasAi <jarbasai@mailfence.com>. Data belongs to Porto Editora / Infopédia; this is an unofficial client — respect their terms and rate limits.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyinfopedia-0.0.1a2.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyinfopedia-0.0.1a2-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file pyinfopedia-0.0.1a2.tar.gz.

File metadata

  • Download URL: pyinfopedia-0.0.1a2.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyinfopedia-0.0.1a2.tar.gz
Algorithm Hash digest
SHA256 e83a7e97f7577651e6001860b769c1a52ed3fccfe2ece6258d05c4959b0199b4
MD5 843297b9af311103003f8592702bcf06
BLAKE2b-256 5422c2b610ec11e8ec939fd4a37ab353612a21df12c2c3795ba54ec3798acd64

See more details on using hashes here.

File details

Details for the file pyinfopedia-0.0.1a2-py3-none-any.whl.

File metadata

  • Download URL: pyinfopedia-0.0.1a2-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyinfopedia-0.0.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 a735b78fc73e36f1f39fdc408e7c001b9258b7f86a13b5d5197582335116c2a0
MD5 79d7afb0d68714085c1ad6077132a511
BLAKE2b-256 bcfaabbdcfb7730e7240ff1aadb060ed4eefa1933eac937528166193446b84c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page