A library for topic modeling and visualization.
Project description
DARIAH Topics is an easy-to-use Python library for topic modeling and visualization. Getting started is really easy. All you have to do is import the library – you can train a model straightaway from raw textfiles.
It supports two implementations of latent Dirichlet allocation:
Installation
$ pip install dariah
Example
>>> import dariah >>> model, vis = dariah.topics(directory="british-fiction-corpus", ... stopwords=100, ... num_topics=10, ... num_iterations=1000) >>> model.topics.iloc[:5, :5] word0 word1 word2 word3 word4 topic0 phineas lord laura course house topic1 don't mother came go looked topic2 jones adams am indeed answered topic3 tom adam maggie it's tulliver topic4 crawley george osborne rebecca amelia
With the vis object, you can visualize the model’s probability distributions, e.g. with vis.topic_document():
Developing
Poetry automatically creates a virtual environment, builds and publishes the project to PyPI. Install dependencies with:
$ poetry install
run tests:
$ poetry run pytest
format code:
$ poetry run black dariah
build the project:
$ poetry build
and publish it on PyPI:
$ poetry publish
About DARIAH-DE
DARIAH-DE supports research in the humanities and cultural sciences with digital methods and procedures. The research infrastructure of DARIAH-DE consists of four pillars: teaching, research, research data and technical components. As a partner in DARIAH-EU, DARIAH-DE helps to bundle and network state-of-the-art activities of the digital humanities. Scientists use DARIAH, for example, to make research data available across Europe. The exchange of knowledge and expertise is thus promoted across disciplines and the possibility of discovering new scientific discourses is encouraged.
This software library has been developed with support from the DARIAH-DE initiative, the German branch of DARIAH-EU, the European Digital Research Infrastructure for the Arts and Humanities consortium. Funding has been provided by the German Federal Ministry for Research and Education (BMBF) under the identifier 01UG1610J.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dariah-2.0.2.tar.gz
.
File metadata
- Download URL: dariah-2.0.2.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.9 CPython/3.8.3 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 372aefcb28fa32ba6d86db047750e4c76714dbfdf070aa331f392842f138ffaf |
|
MD5 | 985ac658cef9a014f2d936e68bd25a51 |
|
BLAKE2b-256 | dac8a02aeec538bb99a4c1ca637bd3570968dd43f963c11d8026b026302f556e |
File details
Details for the file dariah-2.0.2-py3-none-any.whl
.
File metadata
- Download URL: dariah-2.0.2-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.9 CPython/3.8.3 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3ee36fed3850446f03d6191d82f7707c6f7f915b03f86590e565c0caf394670 |
|
MD5 | 749d66d189a12621ebf78da945da52d9 |
|
BLAKE2b-256 | 1862783be6d1be6ac4b56166cafd42f1987c4bb3bfbea3afcfb9200db517fe96 |