Pacote com funções para processamento de modelos LDA
Project description
GhuLDA
Funções simples para pré-processar textos e treinar modelos de tópicos LDA (Latent Dirichlet Allocation), construído sobre spaCy e gensim.
Instalação
pip install GhuLDA
Baixe também o modelo de português do spaCy:
python -m spacy download pt_core_news_lg
O que cada parte faz
| Função / Classe | Para que serve |
|---|---|
Tokenizer |
Tokeniza, lematiza e filtra tokens por classe gramatical (substantivo, verbo, adjetivo, nome próprio). |
add_bigram |
Junta pares de palavras que aparecem juntas com frequência (ex.: aprendizado_maquina). |
create_dictionary |
Cria o dicionário de termos, com filtro opcional de palavras muito raras/frequentes. |
create_corpus |
Converte os documentos em bag-of-words. |
ModelLDA |
Treina o modelo LDA (alpha/eta automáticos por padrão). |
calc_coherence |
Calcula a coerência dos tópicos (c_v, u_mass, etc.). |
Requisitos
- Python 3.10+
gensim,spacy,tqdm(instalados automaticamente)
Licença
MIT — Erick Ghuron
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ghulda-2.0.2.tar.gz
(3.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghulda-2.0.2.tar.gz.
File metadata
- Download URL: ghulda-2.0.2.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e859c08440208898909e66effba3f2b0703e41820af369d2daf4955303e34aff
|
|
| MD5 |
0b3627bb48b3bffb564180c0d78af27d
|
|
| BLAKE2b-256 |
a6f1a43f9d2d06273e9f10597c19c944d10d5aafc2fe187776befd31b3e03d91
|
File details
Details for the file ghulda-2.0.2-py3-none-any.whl.
File metadata
- Download URL: ghulda-2.0.2-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b2c7d55bde551e3dee1f2f85e0de4aee18492da8ee6adabc00907437fd2e349
|
|
| MD5 |
76fef3dca67924d23a4a532b293b5965
|
|
| BLAKE2b-256 |
62397ac746845b3f8a1ebf8c9291a88e20487321a1ef4b28e2d4179745b96917
|