Skip to main content

A library for topic modeling and visualization.

Project description

DARIAH Topics is an easy-to-use Python library for topic modeling and visualization. Getting started is really easy. All you have to do is import the library – you can train a model straightaway from raw textfiles.

It supports two implementations of latent Dirichlet allocation:

  • The lightweight, Cython-based package lda

  • The more robust, Java-based package MALLET

Installation

$ pip install dariah

Example

>>> import dariah
>>> model, vis = dariah.topics(directory="british-fiction-corpus",
...                            stopwords=100,
...                            num_topics=10,
...                            num_iterations=1000)
>>> model.topics.iloc[:5, :5]
          word0   word1    word2    word3     word4
topic0  phineas    lord    laura   course     house
topic1    don't  mother     came       go    looked
topic2    jones   adams       am   indeed  answered
topic3      tom    adam   maggie     it's  tulliver
topic4  crawley  george  osborne  rebecca    amelia

With the vis object, you can visualize the model’s probability distributions, e.g. with vis.topic_document():

https://raw.githubusercontent.com/DARIAH-DE/Topics/testing/docs/images/topic-document.png

Developing

Poetry automatically creates a virtual environment, builds and publishes the project to PyPI. Install dependencies with:

$ poetry install

run tests:

$ poetry run pytest

format code:

$ poetry run black dariah

build the project:

$ poetry build

and publish it on PyPI:

$ poetry publish

About DARIAH-DE

DARIAH-DE supports research in the humanities and cultural sciences with digital methods and procedures. The research infrastructure of DARIAH-DE consists of four pillars: teaching, research, research data and technical components. As a partner in DARIAH-EU, DARIAH-DE helps to bundle and network state-of-the-art activities of the digital humanities. Scientists use DARIAH, for example, to make research data available across Europe. The exchange of knowledge and expertise is thus promoted across disciplines and the possibility of discovering new scientific discourses is encouraged.

This software library has been developed with support from the DARIAH-DE initiative, the German branch of DARIAH-EU, the European Digital Research Infrastructure for the Arts and Humanities consortium. Funding has been provided by the German Federal Ministry for Research and Education (BMBF) under the identifier 01UG1610J.

https://raw.githubusercontent.com/DARIAH-DE/Topics/master/docs/images/dariah-de_logo.png https://raw.githubusercontent.com/DARIAH-DE/Topics/master/docs/images/bmbf_logo.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dariah-2.0.2.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

dariah-2.0.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file dariah-2.0.2.tar.gz.

File metadata

  • Download URL: dariah-2.0.2.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.8.3 Darwin/19.6.0

File hashes

Hashes for dariah-2.0.2.tar.gz
Algorithm Hash digest
SHA256 372aefcb28fa32ba6d86db047750e4c76714dbfdf070aa331f392842f138ffaf
MD5 985ac658cef9a014f2d936e68bd25a51
BLAKE2b-256 dac8a02aeec538bb99a4c1ca637bd3570968dd43f963c11d8026b026302f556e

See more details on using hashes here.

File details

Details for the file dariah-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: dariah-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.8.3 Darwin/19.6.0

File hashes

Hashes for dariah-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e3ee36fed3850446f03d6191d82f7707c6f7f915b03f86590e565c0caf394670
MD5 749d66d189a12621ebf78da945da52d9
BLAKE2b-256 1862783be6d1be6ac4b56166cafd42f1987c4bb3bfbea3afcfb9200db517fe96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page