Skip to main content

Short Text Mining

Project description

Short Text Mining in Python

CircleCI GitHub release Documentation Status Updates Python 3 pypi download stars

Introduction

This package shorttext is a Python package that facilitates supervised and unsupervised learning for short text categorization. Due to the sparseness of words and the lack of information carried in the short texts themselves, an intermediate representation of the texts and documents are needed before they are put into any classification algorithm. In this package, it facilitates various types of these representations, including topic modeling and word-embedding algorithms.

The package shorttext runs on Python 3.11, 3.12, and 3.13. Characteristics:

  • example data provided (including subject keywords and NIH RePORT);
  • text preprocessing;
  • pre-trained word-embedding support;
  • gensim topic models (LDA, LSI, Random Projections) and autoencoder;
  • topic model representation supported for supervised learning using scikit-learn;
  • cosine distance classification;
  • neural network classification (including ConvNet, and C-LSTM);
  • maximum entropy classification;
  • metrics of phrases differences, including soft Jaccard score (using Damerau-Levenshtein distance), and Word Mover's distance (WMD);
  • character-level sequence-to-sequence (seq2seq) learning; and
  • spell correction.

Documentation

Documentation and tutorials for shorttext can be found here: http://shorttext.rtfd.io/.

See tutorial for how to use the package, and FAQ.

Installation

To install it, in a console, use pip.

>>> pip install shorttext

or, if you want the most recent development version on Github, type

>>> pip install git+https://github.com/stephenhky/PyShortTextCategorization@master

See installation guide for more details.

Issues

To report any issues, go to the Issues tab of the Github page and start a thread. It is welcome for developers to submit pull requests on their own to fix any errors.

Contributors

If you would like to contribute, feel free to submit the pull requests to the develop branch. You can talk to me in advance through e-mails or the Issues page.

Useful Links

News

  • 05/14/2026: shorttext 4.0.1 released.
  • 04/19/2026: shorttext 4.0.0 released.
  • 03/22/2026: shorttext 3.1.1 released.
  • 03/02/2026: shorttext 3.1.0 reelased.
  • 10/27/2025: shorttext 3.0.1 released.
  • 08/10/2025: shorttext 3.0.0 released.
  • 06/02/2025: shorttext 2.2.1 released. (Acknowledgement: Minseo Kim)
  • 05/29/2025: shorttext 2.2.0 released. (Acknowledgement: Minseo Kim)
  • 05/08/2025: shorttext 2.1.1 released.
  • 12/14/2024: shorttext 2.1.0 released.
  • 07/12/2024: shorttext 2.0.0 released.
  • 12/21/2023: shorttext 1.6.1 released.
  • 08/26/2023: shorttext 1.6.0 released.
  • 06/19/2023: shorttext 1.5.9 released.
  • 09/23/2022: shorttext 1.5.8 released.
  • 09/22/2022: shorttext 1.5.7 released.
  • 08/29/2022: shorttext 1.5.6 released.
  • 05/28/2022: shorttext 1.5.5 released.
  • 12/15/2021: shorttext 1.5.4 released.
  • 07/11/2021: shorttext 1.5.3 released.
  • 07/06/2021: shorttext 1.5.2 released.
  • 04/10/2021: shorttext 1.5.1 released.
  • 04/09/2021: shorttext 1.5.0 released.
  • 02/11/2021: shorttext 1.4.8 released.
  • 01/11/2021: shorttext 1.4.7 released.
  • 01/03/2021: shorttext 1.4.6 released.
  • 12/28/2020: shorttext 1.4.5 released.
  • 12/24/2020: shorttext 1.4.4 released.
  • 11/10/2020: shorttext 1.4.3 released.
  • 10/18/2020: shorttext 1.4.2 released.
  • 09/23/2020: shorttext 1.4.1 released.
  • 09/02/2020: shorttext 1.4.0 released.
  • 07/23/2020: shorttext 1.3.0 released.
  • 06/05/2020: shorttext 1.2.6 released.
  • 05/20/2020: shorttext 1.2.5 released.
  • 05/13/2020: shorttext 1.2.4 released.
  • 04/28/2020: shorttext 1.2.3 released.
  • 04/07/2020: shorttext 1.2.2 released.
  • 03/23/2020: shorttext 1.2.1 released.
  • 03/21/2020: shorttext 1.2.0 released.
  • 12/01/2019: shorttext 1.1.6 released.
  • 09/24/2019: shorttext 1.1.5 released.
  • 07/20/2019: shorttext 1.1.4 released.
  • 07/07/2019: shorttext 1.1.3 released.
  • 06/05/2019: shorttext 1.1.2 released.
  • 04/23/2019: shorttext 1.1.1 released.
  • 03/03/2019: shorttext 1.1.0 released.
  • 02/14/2019: shorttext 1.0.8 released.
  • 01/30/2019: shorttext 1.0.7 released.
  • 01/29/2019: shorttext 1.0.6 released.
  • 01/13/2019: shorttext 1.0.5 released.
  • 10/03/2018: shorttext 1.0.4 released.
  • 08/06/2018: shorttext 1.0.3 released.
  • 07/24/2018: shorttext 1.0.2 released.
  • 07/17/2018: shorttext 1.0.1 released.
  • 07/14/2018: shorttext 1.0.0 released.
  • 06/18/2018: shorttext 0.7.2 released.
  • 05/30/2018: shorttext 0.7.1 released.
  • 05/17/2018: shorttext 0.7.0 released.
  • 02/27/2018: shorttext 0.6.0 released.
  • 01/19/2018: shorttext 0.5.11 released.
  • 01/15/2018: shorttext 0.5.10 released.
  • 12/14/2017: shorttext 0.5.9 released.
  • 11/08/2017: shorttext 0.5.8 released.
  • 10/27/2017: shorttext 0.5.7 released.
  • 10/17/2017: shorttext 0.5.6 released.
  • 09/28/2017: shorttext 0.5.5 released.
  • 09/08/2017: shorttext 0.5.4 released.
  • 09/02/2017: end of GSoC project. (Report)
  • 08/22/2017: shorttext 0.5.1 released.
  • 07/28/2017: shorttext 0.4.1 released.
  • 07/26/2017: shorttext 0.4.0 released.
  • 06/16/2017: shorttext 0.3.8 released.
  • 06/12/2017: shorttext 0.3.7 released.
  • 06/02/2017: shorttext 0.3.6 released.
  • 05/30/2017: GSoC project (Chinmaya Pancholi, with gensim)
  • 05/16/2017: shorttext 0.3.5 released.
  • 04/27/2017: shorttext 0.3.4 released.
  • 04/19/2017: shorttext 0.3.3 released.
  • 03/28/2017: shorttext 0.3.2 released.
  • 03/14/2017: shorttext 0.3.1 released.
  • 02/23/2017: shorttext 0.2.1 released.
  • 12/21/2016: shorttext 0.2.0 released.
  • 11/25/2016: shorttext 0.1.2 released.
  • 11/21/2016: shorttext 0.1.1 released.

Acknowledgements

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shorttext-4.0.1.tar.gz (67.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shorttext-4.0.1-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

File details

Details for the file shorttext-4.0.1.tar.gz.

File metadata

  • Download URL: shorttext-4.0.1.tar.gz
  • Upload date:
  • Size: 67.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for shorttext-4.0.1.tar.gz
Algorithm Hash digest
SHA256 2a7641051f8c3a98cb2f552a97476a40a8d6ff659d71cc128baf22501879d968
MD5 d8bf2df02c5599729fc26417333e14c1
BLAKE2b-256 bff7cb7d51c7043df138aa074e66f9f367fb50280bcafd662c45e53f441d5205

See more details on using hashes here.

File details

Details for the file shorttext-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: shorttext-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for shorttext-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d5083856a2b982825eff70588a54da3c9a012613cda660d28239ff2e1d707fe
MD5 04e1b6cc074da11b04af7867c2b29855
BLAKE2b-256 d0eb324a763b165a27c7e20f0b8a123f84a41f4b26c5fdbd61506eaa2bd0ee00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page