A standalone NMF topic modeling tool for Turkish and English texts

These details have not been verified by PyPI

Project links

Project description

NMF Standalone

A comprehensive topic modeling system using Non-negative Matrix Factorization (NMF) that supports both English and Turkish text processing. Features advanced tokenization techniques, multiple NMF algorithms, and rich visualization capabilities.

Quick Start

Installation from PyPI

pip install nmf-standalone

Command Line Usage

# Turkish text analysis
nmf-standalone analyze data.csv --column text --language TR --topics 5

# English text analysis with lemmatization and visualizations
nmf-standalone analyze data.csv --column content --language EN --topics 10 --lemmatize --wordclouds --excel

# Custom tokenizer for Turkish text
nmf-standalone analyze reviews.csv --column review_text --language TR --topics 8 --tokenizer bpe --wordclouds

Python API Usage

from nmf_standalone import run_topic_analysis

# Simple topic modeling
results = run_topic_analysis(
    filepath="data.csv",
    column="review_text",
    language="EN",
    topics=5,
    lemmatize=True
)

# Turkish text analysis
results = run_topic_analysis(
    filepath="turkish_reviews.csv", 
    column="yorum_metni",
    language="TR",
    topics=8,
    tokenizer_type="bpe",
    generate_wordclouds=True
)

Package Structure

nmf_standalone/
├── functions/
│   ├── common_language/          # Shared functionality across languages
│   │   ├── emoji_processor.py    # Emoji handling utilities
│   │   └── topic_analyzer.py     # Cross-language topic analysis
│   ├── english/                  # English text processing modules
│   │   ├── english_preprocessor.py      # Text cleaning and preprocessing
│   │   ├── english_vocabulary.py        # Vocabulary creation
│   │   ├── english_text_encoder.py      # Text-to-numerical conversion
│   │   ├── english_topic_analyzer.py    # Topic extraction utilities
│   │   ├── english_topic_output.py      # Topic visualization and output
│   │   └── english_nmf_core.py          # NMF implementation for English
│   ├── nmf/                      # NMF algorithm implementations
│   │   ├── nmf_orchestrator.py          # Main NMF interface
│   │   ├── nmf_initialization.py        # Matrix initialization strategies
│   │   ├── nmf_basic.py                 # Standard NMF algorithm
│   │   ├── nmf_projective_basic.py      # Basic projective NMF
│   │   └── nmf_projective_enhanced.py   # Enhanced projective NMF
│   ├── tfidf/                    # TF-IDF calculation modules
│   │   ├── tfidf_english_calculator.py  # English TF-IDF implementation
│   │   ├── tfidf_turkish_calculator.py  # Turkish TF-IDF implementation
│   │   ├── tfidf_tf_functions.py        # Term frequency functions
│   │   ├── tfidf_idf_functions.py       # Inverse document frequency functions
│   │   └── tfidf_bm25_turkish.py        # BM25 implementation for Turkish
│   └── turkish/                  # Turkish text processing modules
│       ├── turkish_preprocessor.py      # Turkish text cleaning
│       ├── turkish_tokenizer_factory.py # Tokenizer creation and training
│       ├── turkish_text_encoder.py      # Text-to-numerical conversion
│       └── turkish_tfidf_generator.py   # TF-IDF matrix generation
├── utils/                        # Helper utilities
│   ├── coherence_score.py              # Topic coherence evaluation
│   ├── gen_cloud.py                    # Word cloud generation
│   ├── export_excel.py                 # Excel export functionality
│   ├── topic_dist.py                   # Topic distribution plotting
│   └── other/                           # Additional utility functions
├── cli.py                        # Command-line interface
├── standalone_nmf.py             # Core NMF implementation
└── __init__.py                   # Package initialization and public API

Installation

From PyPI (Recommended)

pip install nmf-standalone

From Source (Development)

Clone the repository:

git clone https://github.com/yourusername/nmf-standalone.git
cd nmf-standalone

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Command Line Interface

The package provides the nmf-standalone command with an analyze subcommand:

# Basic usage
nmf-standalone analyze data.csv --column text --language TR --topics 5

# Advanced usage with all options
nmf-standalone analyze reviews.csv \
  --column review_text \
  --language EN \
  --topics 10 \
  --words-per-topic 20 \
  --nmf-method opnmf \
  --lemmatize \
  --wordclouds \
  --excel \
  --topic-distribution \
  --output-name my_analysis

Command Line Options

Required Arguments:

filepath: Path to input CSV or Excel file
--column, -c: Name of column containing text data
--language, -l: Language ("TR" for Turkish, "EN" for English)

Optional Arguments:

--topics, -t: Number of topics to extract (default: 5)
--output-name, -o: Custom name for output files (default: auto-generated)
--tokenizer: Tokenizer type for Turkish ("bpe" or "wordpiece", default: "bpe")
--nmf-method: NMF algorithm ("nmf" or "opnmf", default: "nmf")
--words-per-topic: Number of top words per topic (default: 15)
--lemmatize: Apply lemmatization for English text
--wordclouds: Generate word cloud visualizations
--excel: Export results to Excel format
--topic-distribution: Generate topic distribution plots
--separator: CSV separator character (default: "|")
--filter-app: Filter data by specific app name

Python API

from nmf_standalone import run_topic_analysis

# Basic English text analysis
results = run_topic_analysis(
    filepath="data.csv",
    column="review_text",
    language="EN",
    topics=5,
    lemmatize=True,
    generate_wordclouds=True,
    export_excel=True
)

# Advanced Turkish text analysis
results = run_topic_analysis(
    filepath="turkish_reviews.csv",
    column="yorum_metni",
    language="TR",
    topics=10,
    words_per_topic=15,
    tokenizer_type="bpe",
    nmf_method="nmf",
    generate_wordclouds=True,
    export_excel=True,
    topic_distribution=True
)

API Parameters

Required:

filepath (str): Path to input CSV or Excel file
column (str): Name of column containing text data

Optional:

language (str): "TR" for Turkish, "EN" for English (default: "EN")
topics (int): Number of topics to extract (default: 5)
words_per_topic (int): Top words to show per topic (default: 15)
nmf_method (str): "nmf" or "opnmf" algorithm variant (default: "nmf")
tokenizer_type (str): "bpe" or "wordpiece" for Turkish (default: "bpe")
lemmatize (bool): Apply lemmatization for English (default: True)
generate_wordclouds (bool): Create word cloud visualizations (default: True)
export_excel (bool): Export results to Excel (default: True)
topic_distribution (bool): Generate distribution plots (default: True)
output_name (str): Custom output directory name (default: auto-generated)
separator (str): CSV separator character (default: ",")
filter_app (bool): Enable app filtering (default: False)
filter_app_name (str): App name for filtering (default: "")

Outputs

The analysis generates several outputs in an Output/ directory (created at runtime), organized in a subdirectory named after your analysis:

Topic-Word Excel File: .xlsx file containing top words for each topic and their scores
Word Clouds: PNG images of word clouds for each topic (if generate_wordclouds=True)
Topic Distribution Plot: Plot showing distribution of documents across topics (if topic_distribution=True)
Coherence Scores: JSON file with coherence scores for the topics
Top Documents: JSON file listing most representative documents for each topic

Features

Multi-language Support: Optimized processing for both Turkish and English texts
Advanced Tokenization: BPE and WordPiece tokenizers for Turkish, traditional tokenization for English
Multiple NMF Algorithms: Standard NMF and Orthogonal Projective NMF (OPNMF)
Rich Visualizations: Word clouds and topic distribution plots
Flexible Export: Excel and JSON export formats
Coherence Evaluation: Built-in topic coherence scoring
Text Preprocessing: Language-specific text cleaning and preprocessing

Requirements

Python 3.9+
Dependencies are automatically installed with the package

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please open an issue on the GitHub repository.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4.post2

Jul 3, 2025

0.3.4.post1

Jul 3, 2025

0.3.4

Jul 3, 2025

0.3.3

Jul 3, 2025

0.3.2

Jul 3, 2025

0.3.1

Jul 3, 2025

0.3.0

Jul 3, 2025

0.2.8

Jul 2, 2025

0.2.7

Jul 2, 2025

0.2.6

Jul 2, 2025

0.2.5

Jul 2, 2025

0.2.4

Jul 2, 2025

0.2.3

Jul 2, 2025

0.2.2

Jul 2, 2025

0.2.1

Jul 2, 2025

This version

0.2.0

Jul 2, 2025

0.1.9

Jul 2, 2025

0.1.8

Jul 2, 2025

0.1.7

Jul 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmf_standalone-0.2.0.tar.gz (59.6 kB view details)

Uploaded Jul 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nmf_standalone-0.2.0-py3-none-any.whl (77.4 kB view details)

Uploaded Jul 2, 2025 Python 3

File details

Details for the file nmf_standalone-0.2.0.tar.gz.

File metadata

Download URL: nmf_standalone-0.2.0.tar.gz
Upload date: Jul 2, 2025
Size: 59.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.18

File hashes

Hashes for nmf_standalone-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`419fc2e619631f2572cfa1e67b55137fce827e39097974cd7f1475c4961996ac`
MD5	`2591c887dd9de713f90434e412e16f14`
BLAKE2b-256	`318238d088561801f42d451ab01070990c20bc8b32ee027146c6991569c4f8ef`

See more details on using hashes here.

File details

Details for the file nmf_standalone-0.2.0-py3-none-any.whl.

File metadata

Download URL: nmf_standalone-0.2.0-py3-none-any.whl
Upload date: Jul 2, 2025
Size: 77.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.18

File hashes

Hashes for nmf_standalone-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0bb6fc5f28345f5f47d07ee235c8312db15d0937591f1abbbcfa494f7991a5cd`
MD5	`e29ed8a4e646bccc90d6277341fb9031`
BLAKE2b-256	`ed6a41bb965b3cb93d317e32c92b3038ce6ad2459253fd8151c446202d0f692d`

See more details on using hashes here.

nmf-standalone 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NMF Standalone

Quick Start

Installation from PyPI

Command Line Usage

Python API Usage

Package Structure

Installation

From PyPI (Recommended)

From Source (Development)

Usage

Command Line Interface

Command Line Options

Python API

API Parameters

Outputs

Features

Requirements

License

Contributing

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes