Animated version of classic word cloud for time-series text data
Project description
AnimatedWordCloud
Animated version of classic word cloud for time-series text data
Classic word cloud graph does not consider the time variation in text data. Animated word cloud improves on this and displays text datasets collected over multiple periods in a single MP4 file. The core framework for the animation of word frequencies was developed by Michael Cane in the WordsSwarm project. AnimatedWordCloud makes the codes efficiently work on various text datasets of the Latin alphabet languages.
Installation
It requires Python 3.8, Box2D, beautifulsoup4, pygame, PyQt6 - visualization, Arabica and ftfy for text preprocessing.
To install using pip, use:
pip install AnimatedWordCloud
AnimatedWordCloud has been tested with Pycharm cummunity. It's recommended to use this IDE and run .py files instead .ipynb.
Usage
- Import the library:
from AnimatedWordCloud import animated_word_cloud
- Generate frames:
animated_word_cloud generates 90 png word cloud images per period. It scales word frequencies to display word clouds on text datasets of different sizes. Frames are stored in the working directory in the newly created .post_processing/frames folder. It currently provides unigram frequencies (bigram frequencies will be added later). It reads dates in:
- US-style: MM/DD/YYYY (2013-12-31, Feb-09-2009, 2013-12-31 11:46:17, etc.)
- European-style: DD/MM/YYYY (2013-31-12, 09-Feb-2009, 2013-31-12 11:46:17, etc.) date and datetime formats.
It automatically cleans data from punctuation and numbers on input. It can also remove the standard list(s) of stopwods for languages in the NLTK corpus of stopwords.
def animated_word_cloud(text: str, # Text
time: str, # Time
date_format: str, # Date format: 'eur' - European, 'us' - American
ngram: int = '', # N-gram order, 1 = unigram
freq: str = '', # Aggregation period: 'Y'/'M'
stopwords: [], # Languages for stop words
)
To apply the method, use:
import pandas as pd
data = pd.read_csv("data.csv")
animated_word_cloud(text = data['text'], # Read text column
time = data['date'], # Read date column
date_format = 'us', # Specify date format
ngram = 1, # Show individual word frequencies
freq ='Y', # Yearly frequency
stopwords = ['english', 'german','french']) # Clean from English, German and French stop words
- Create video from frames:
Download the ffmpeg folder and the frames2video.bat file from here and place them into the postprocessing folder. Next, run frames2video.bat, which will generate a wordSwarmOut.mp4 file, which is the desired output.
Documentation, examples and tutorials
-
Read the documentation: TBA
-
For more examples of coding, read these tutorials: TBA
Here are examples of animated word clouds:
Research trends in Economics Youtube
European Central Bankers' speeches Youtube
Please visit here for any questions, issues, bugs, and suggestions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file AnimatedWordCloud-1.0.3.tar.gz
.
File metadata
- Download URL: AnimatedWordCloud-1.0.3.tar.gz
- Upload date:
- Size: 35.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbe65f6247e5f1c0a485939afffeed638cfae82e2932a13e6e020c5c8c391907 |
|
MD5 | e9c2168a3533ab8f2f417853802ecb06 |
|
BLAKE2b-256 | fcde05b8a0cc4af0afc45451d3afd2246932cbe1b8f2e527bdd6a7a2c24ef849 |
File details
Details for the file AnimatedWordCloud-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: AnimatedWordCloud-1.0.3-py3-none-any.whl
- Upload date:
- Size: 35.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce0dd71fb71a07e7a65a9e2ba7632c421b540330e4b230f88a295c3c0ef015f1 |
|
MD5 | 9589d4b0e843653e5cc2cd96d9950d2a |
|
BLAKE2b-256 | 34bf94bb0d49b771e6abe91a04f05f2ba9aa4cea9dea0a3513e8f98256501542 |