Short-text tagger generates topic distributions for all texts in a corpus.
Project description
short_text_tagger
short_text_tagger generates topic distributions for all texts in a corpus.
Free software: MIT license
Installation
pip install short_text_tagger
Usage
If you have graph-tool installed and want to use its community detection functionality to generate topics, then import short_text_tagger.generate_topic_distributions_from_corpus into your project. This function expects a pandas DataFrame with columns id and text.
If you don’t have graph-tool installed or want to substitute other community detection algorithms, then you have the option of importing cleaned_texts_df_from_data from short_text_tagger for text preprocessing and adding a required words column to the aforementioned DataFrame. After, you can import assign_text_probabilities, which expects the input DataFrame with an added words column and a list of dictionaries (word to topic mappings) and returns the same DataFrame with appended topic probability columns. The hook is the creation of the list of word to topic mappings. In this package, that functionality is provided by word_to_block_dict.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2020-10-09)
First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file short_text_tagger-0.1.7.tar.gz
.
File metadata
- Download URL: short_text_tagger-0.1.7.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b26f5ce71f0fd87afa8407898bedfd8bc3a0620b185491068fd52ca5ac717ae |
|
MD5 | a5fec7dfec504d89b7a5668b4689e717 |
|
BLAKE2b-256 | 90c37d1e7c7649a1b0e4f0e7d44cce1e5723ad8b6c397402e674f273ca53843f |