Skip to main content

Pacote com funções para processamento de modelos LDA

Project description

GhuLDA

Funções simples para pré-processar textos e treinar modelos de tópicos LDA (Latent Dirichlet Allocation), construído sobre spaCy e gensim.

Instalação

pip install GhuLDA

Baixe também o modelo de português do spaCy:

python -m spacy download pt_core_news_lg

O que cada parte faz

Função / Classe Para que serve
Tokenizer Tokeniza, lematiza e filtra tokens por classe gramatical (substantivo, verbo, adjetivo, nome próprio).
add_bigram Junta pares de palavras que aparecem juntas com frequência (ex.: aprendizado_maquina).
create_dictionary Cria o dicionário de termos, com filtro opcional de palavras muito raras/frequentes.
create_corpus Converte os documentos em bag-of-words.
ModelLDA Treina o modelo LDA (alpha/eta automáticos por padrão).
calc_coherence Calcula a coerência dos tópicos (c_v, u_mass, etc.).

Requisitos

  • Python 3.10+
  • gensim, spacy, tqdm (instalados automaticamente)

Licença

MIT — Erick Ghuron

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghulda-2.0.2.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghulda-2.0.2-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file ghulda-2.0.2.tar.gz.

File metadata

  • Download URL: ghulda-2.0.2.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ghulda-2.0.2.tar.gz
Algorithm Hash digest
SHA256 e859c08440208898909e66effba3f2b0703e41820af369d2daf4955303e34aff
MD5 0b3627bb48b3bffb564180c0d78af27d
BLAKE2b-256 a6f1a43f9d2d06273e9f10597c19c944d10d5aafc2fe187776befd31b3e03d91

See more details on using hashes here.

File details

Details for the file ghulda-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: ghulda-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ghulda-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5b2c7d55bde551e3dee1f2f85e0de4aee18492da8ee6adabc00907437fd2e349
MD5 76fef3dca67924d23a4a532b293b5965
BLAKE2b-256 62397ac746845b3f8a1ebf8c9291a88e20487321a1ef4b28e2d4179745b96917

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page