MTMinePy - A Multilingual Text Mining Platform for Academic Research
Project description
MTMinePy - Multilingual Text Miner with Python
MTMinePy is a Python-based academic text mining platform inspired by MTMineR. It is a comprehensive Flask web application designed for powerful text mining and analysis, supporting interactive visualization and advanced modeling.
Key Features
- Advanced NLP: Integrated with
jieba,HanLP,LTP,Spacy, andNLTK. - Multilingual: Native support for 10+ languages including Chinese, English, Japanese, and more.
- Interactive Visualization: Powered by ECharts, supporting responsive force-directed networks, dynamic word clouds, and interactive scatter plots.
- Academic Metric Analysis: Supports advanced distance functions (Hsim, Close, Esim) and high-end visualization.
- Advanced Modeling: Comprehensive suite of Unsupervised (Clustering, Topic Modeling) and Supervised learning algorithms.
Screenshots
Chinese Analysis
| Co-occurrence Network | Word Cloud | Clustering |
|---|---|---|
English Analysis
| Co-occurrence Network | Word Cloud | Clustering |
|---|---|---|
Installation
Install from PyPI (Recommended)
pip install mtminepy
To install with all optional NLP backends (Janome, spaCy, HanLP, LTP, UMAP, Boruta, etc.):
pip install mtminepy[full]
Install from source
git clone https://github.com/EasyCam/MTMinePy.git
cd MTMinePy
pip install -e .
Usage
Run from command line
After installation, run directly:
mtminepy
Access the dashboard at http://localhost:5000.
Command-line options
mtminepy --help
mtminepy --port 8080 # Custom port
mtminepy --host 127.0.0.1 # Bind to localhost only
mtminepy --debug # Flask debug mode
mtminepy --version # Show version
Run from Python
from mtminepy.app import create_app
app = create_app()
app.run(host='0.0.0.0', port=5000)
Advanced Capabilities
Modeling Algorithms
MTMinePy supports a wide range of standard machine learning algorithms for text analysis:
- Feature Engineering: TF-IDF, Bag of Words (CountVectorizer), N-gram support.
- Unsupervised Learning:
- Topic Modeling: Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), STM (Structural Topic Model).
- Clustering: K-Means, Agglomerative (Hierarchical), DBSCAN, Spectral Clustering.
- Dimensionality Reduction: PCA, t-SNE, UMAP, Factor Analysis.
- Supervised Learning (Classification):
- Support Vector Machines (SVM)
- Random Forest
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Logistic Regression (Elastic Net)
Mathematical Models (Distance & Similarity)
MTMinePy supports advanced metrics for academic research:
Advanced Custom Similarity Measures
-
Hsim (Yang Fengzhao, 2007) $$ Hsim(x_i, x_j) = \frac{1}{n} \sum_{k=1}^n \frac{1}{1+|x_{ik}-x_{jk}|} $$
-
Close (Shao Changsheng, et al., 2011) $$ Close(x_i, x_j) = \frac{1}{n} \sum_{k=1}^n e^{-|x_{ik}-x_{jk}|} $$
-
Esim (Wang Xiaoyang, et al., 2013) $$ Esim(x_{ik}, x_{jk}) = \frac{1}{n} \sum_{k=1}^d \omega_k e^{-\frac{|x_{ik}-x_{jk}|}{|x_{ik}-x_{jk}|+|x_{ik}+x_{jk}|/2}} $$
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mtminepy-0.1.0.tar.gz.
File metadata
- Download URL: mtminepy-0.1.0.tar.gz
- Upload date:
- Size: 44.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecef932befd5d7923e32c72618685f405f5c436576e9553785a89d1e47dda590
|
|
| MD5 |
d2bc711ff7f8c0ddc361e2f2ac9b8daf
|
|
| BLAKE2b-256 |
6466522d3a252bf110c42a94c4b6a852eb692e1eb123807f5edff23e1bd230e2
|
File details
Details for the file mtminepy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mtminepy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 46.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
253007f96dcd1216b1e03707a0ae6e2708ca3d5164f3827bda2f4e3c528b7f17
|
|
| MD5 |
d5c42075a4df0c8ffb14cf67c850f6a7
|
|
| BLAKE2b-256 |
c1d9149eb26c84562c0b10b86d91d400d075349e1d46105ff73ab2361b1b8f24
|