ML-Ask Official — high-performance Japanese emotion analysis (original system, Python rewrite)

These details have not been verified by PyPI

Project links

Project description

ML-Ask Official

eMotive eLement and Expression Analysis system — official Python rewrite by the original author.

High-performance Japanese emotion analysis. Originally developed by Michal Ptaszynski, Pawel Dybala, Rafal Rzepka and Kenji Araki at Hokkaido University, the system was first described in Ptaszynski et al. (2017, Journal of Open Research Software) and extended with new dictionaries in Wang et al. (2024, Applied Sciences). This package is the official Python implementation maintained by the original author.

📦 PyPI: https://pypi.org/project/mlask-official/
🧪 Hosted demo: https://mlask-official.streamlit.app/
🐙 Source + issues: https://github.com/ptaszynski/mlask-official
📝 Changelog: CHANGELOG.md
🗺️ Roadmap: IMPROVEMENTS.md
📚 Citation file: CITATION.cff

Quick start

# 1. system MeCab + IPADIC (macOS shown; see Installation for other OSes)
brew install mecab mecab-ipadic

# 2. the package
pip install 'mlask-official[app]'

# 3. analyze a sentence
python -c "from mlask_official import MLAskOfficial; \
print(MLAskOfficial().analyze('彼のことは嫌いではない！')['valence'])"
# → POSITIVE

CLI and web app both ship in the same install:

echo "腹が立つ" | mlask analyze --format pipe
streamlit run streamlit_app.py     # from a source checkout

About ML-Ask

ML-Ask (eMotive eLement and Expression Analysis system) is a keyword-based rule system for automatic affect annotation of Japanese utterances. It combines a curated lexicon of ~4,700 emotive expressions across 10 categories with a particle-stripped content-form pass and a Contextual Valence Shifter (CVS) layer for negation.

Features

Combined and expanded dictionaries — Nakamura's original Dictionary of Emotive Expressions merged with the Wang & Isomura (2024) two-dictionary expansion (Hiejima's and Murakami's emotion dictionaries, plus automatically extracted expressions). Total: ~4,700 entries across 10 emotion classes, augmented with modern internet language (emoji, kaomoji, gyaru-go, katakana borrowings) per class.
Russell's 2D circumplex model of affect — every emotion class is placed on a (valence, arousal) plane; aggregate sentence orientation is reported as valence (POSITIVE / NEGATIVE / NEUTRAL, optionally mostly_*) and activation (ACTIVE / PASSIVE / NEUTRAL).
Plutchik-wheel colour palette — all 10 emotion classes are colour-coded by the hue angles of Plutchik's published wheel for a familiar, paper-ready palette.
Dual Aho-Corasick matching — two automata are built at startup, one over fully lemmatised dictionary entries (covers verb inflections) and one over particle-stripped content forms (covers particle-omission variants such as 腹がたつ ↔ 腹たつ). Both automata scan in a single O(n + k) pass per sentence; sustained throughput is ~50,000 sentences/sec on a single core, ~100,000 sentences/sec on four.
CVS (Contextual Valence Shifters) — 108 Japanese negation patterns reverse emotion polarity when applied (嫌いではない → positive, not negative). An optional GiNZA dependency-tree pass catches long-distance negation that the local regex misses.
Three-state emotive distinction — analyze() returns emotive: bool even when no specific emotion word is detected, so callers can distinguish emotive-but-unclassifiable sentences (interjections / kaomoji only) from fully non-emotive ones.
Streaming + multiprocessing APIs — analyze_stream() for memory-light corpus processing; auto-parallel batches at ≥ 50,000 sentences.
JA/EN Streamlit web app with publication-quality charts (radar + Russell 2D + time-series + heatmap), PNG export at 2× scale, and a language toggle that flips all UI strings + emotion labels.
On-disk lemma cache — sub-millisecond warm-start once the cache is primed; MD5-invalidated.
Optional UniDic backend via fugashi for users who prefer modern morphological analysis.

Emotion classes

Name	English gloss	Japanese	Valence	Arousal	Plutchik hue
yorokobi	joy	喜び	POS	ACT	yellow
suki	affection	好き	POS	—	yellow-green
yasu	relief	安らぎ	POS	PAS	green-yellow
takaburi	excitement	昂り	NorP	ACT	orange
odoroki	surprise	驚き	NorP	ACT	teal-cyan
haji	shame	恥	NorP	—	rose-purple
aware	sadness	哀しみ	NEG	PAS	royal blue
iya	disgust	嫌悪	NEG	—	dark orchid
kowa	fear	恐れ	NEG	ACT	green
ikari	anger	怒り	NEG	ACT	crimson

Representative emotion

For sentences where ML-Ask detects multiple emotion classes, the representative emotion is the single class chosen as the dominant one for the sentence. The heuristic — inherited from the original Perl ML-Ask — is:

The class whose longest matched expression has the most characters.

For example, in 「腹がたって仕方ない、もう嫌だ」 both ikari (腹が立つ) and iya (嫌だ) match; ikari wins because 腹が立つ is longer than 嫌だ. The intuition is that longer dictionary entries are more specific — and therefore more diagnostic of the speaker's emotion — than shorter, more generic ones.

Returned as result["representative"] = (class_name, [matching_words]).

Installation

ML-Ask Official runs on Linux, macOS and Windows (WSL recommended) with Python 3.10 – 3.13. It depends on the MeCab morphological analyser, which is a system package (not Python), so the install is split into two parts:

install MeCab + a Japanese dictionary at the OS level,
install the mlask-official Python package inside a virtualenv.

Step 1 — Install MeCab + a Japanese dictionary

macOS (Homebrew)

brew install mecab mecab-ipadic

Verify:

echo "今日は嬉しい" | mecab

You should see one token per line and an EOS marker.

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y mecab libmecab-dev mecab-ipadic-utf8

Fedora / RHEL

sudo dnf install mecab mecab-devel mecab-ipadic

Arch Linux

sudo pacman -S mecab mecab-ipadic

Windows

Native Windows MeCab is fragile — the recommended path is Windows Subsystem for Linux (WSL2): install Ubuntu under WSL and follow the Ubuntu instructions above. If you must run on bare Windows, see the mecab-python3 README for the MSVC build steps.

Step 2 — Create a Python virtual environment

Strongly recommended (keeps the package's dependencies out of your system Python):

python3 -m venv .venv
source .venv/bin/activate          # macOS / Linux
# .venv\Scripts\activate.bat       # Windows cmd
# .venv\Scripts\Activate.ps1       # Windows PowerShell

Make sure python --version reports 3.10 or newer.

Step 3 — Install `mlask-official`

From PyPI

pip install mlask-official            # core: analyzer + CLI
pip install 'mlask-official[app]'     # + Streamlit web app
pip install 'mlask-official[fugashi]' # + UniDic backend via fugashi
pip install 'mlask-official[deps]'    # + GiNZA dependency-tree CVS
pip install 'mlask-official[all]'     # everything above

From a source checkout

git clone https://github.com/ptaszynski/mlask-official.git
cd mlask-official
pip install -e .                      # editable core install
pip install -e '.[all]'               # editable + every extra

The base install pulls in mecab-python3, pyahocorasick, and typer automatically.

Step 4 — Verify the installation

Python API

python -c "from mlask_official import MLAskOfficial; \
print(MLAskOfficial().analyze('今日は嬉しい！')['valence'])"

You should see:

POSITIVE

CLI

mlask --help
echo "彼のことは嫌いではない！" | mlask analyze --format pipe

The pipe-format output should look like:

彼のことは嫌いではない！|emotions:(2)|YOR:嫌い*CVS 嫌いな*CVS|SUK:嫌い*CVS 嫌いな*CVS||2D|POSITIVE|NEUTRAL

Streamlit app

The Streamlit application is part of the source repository. Easiest way to try it is the hosted demo:

https://mlask-official.streamlit.app/

To run it locally, clone the repo and launch from there:

git clone https://github.com/ptaszynski/mlask-official.git
cd mlask-official
pip install -e '.[app]'
bash run_app.sh                        # → http://localhost:8501
bash run_app.sh --server.port 8505     # custom port

Open the URL in your browser and try the Quick examples under the input box.

Step 5 — Troubleshooting

RuntimeError: Failed initializing MeCab (no such file: /usr/local/etc/mecabrc)

mecab-python3 can't find mecabrc. Find it and pass it explicitly:

mecab-config --sysconfdir   # → e.g. /opt/homebrew/etc

python -c "from mlask_official import MLAskOfficial; \
print(MLAskOfficial(mecab_arg='-r /opt/homebrew/etc/mecabrc').analyze('嬉しい'))"

Or pass --mecab-arg "-r /opt/homebrew/etc/mecabrc" to the CLI. The Streamlit app has a MeCab arguments field in the sidebar for the same purpose.

No module named 'MeCab'

mecab-python3 failed to compile against your system MeCab. Re-install with verbose output:

pip install --force-reinstall --verbose mecab-python3

On macOS the most common cause is missing Xcode command-line tools (xcode-select --install).

No module named 'fugashi' / 'spacy'

Optional extras aren't installed. Either disable the feature (MLAskOfficial(backend="mecab", use_dependency_cvs=False)) or install the relevant extra group from step 3.

built an empty emotion index / Not an Aho-Corasick automaton yet

MeCab returned no tokens for the shipped dictionary entries — usually a bad or mismatched dictionary path. The error message lists the three most common causes and the fix for each. See also the Notes on tokenisation section below.

Stale lemma cache after a manual dictionary edit

The cache is invalidated by file content (MD5), so saving the file will already invalidate it. To force a rebuild explicitly:

rm -rf ~/.cache/mlask_official
# or per-call:
python -c "from mlask_official import MLAskOfficial; MLAskOfficial(use_cache=False)"

Usage

from mlask_official import MLAskOfficial

a = MLAskOfficial()

# Inflected form — handled by full-lemma automaton
r = a.analyze("身の毛がよだった！")
print(r["emotion"])       # {'kowa': ['身の毛がよだつ']}
print(r["valence"])       # 'NEGATIVE'
print(r["activation"])    # 'ACTIVE'
print(r["emotive"])       # True

# Particle dropped — handled by content-lemma automaton
#   (use the kanji form when possible — IPADIC's lemma for the kana
#    writing `たつ` is the unrelated verb `経つ` "to elapse",
#    so kana variants of ambiguous verbs may miss; see §Notes.)
r = a.analyze("腹立つ！")
print(r["emotion"])       # {'ikari': ['腹立ち', '腹立つ', '腹が立つ≈']}

r = a.analyze("身の毛よだつ")          # particle が dropped
print(r["emotion"])       # {'kowa': ['身の毛がよだつ≈']}

# Negation via CVS
r = a.analyze("彼のことは嫌いではない！")
print(r["valence"])       # 'POSITIVE'  ← 嫌い → CVS flip → yorokobi/suki

# Emotive but no classifiable emotion
r = a.analyze("あーもう！！")
print(r["emotion"])       # None
print(r["emotive"])       # True  ← emotemes detected
print(r["intensifier"])   # {'emotemes': ['！','！'], 'interjections': ['あー','もう']}

# Non-emotive
r = a.analyze("今日は晴れです。")
print(r["emotion"])       # None
print(r["emotive"])       # False

Streaming + parallel APIs

# Generator — constant-memory for large corpora
for result in a.analyze_stream(open("big_corpus.txt", encoding="utf-8")):
    process(result)

# Multiprocessing — auto-on for batches ≥ 50,000 sentences
results = a.analyze_batch(texts)                 # auto: parallel iff len(texts) ≥ 50_000
results = a.analyze_batch(texts, parallel=True,  # force on
                          workers=8)

Notes on tokenisation

ML-Ask delegates tokenisation and lemmatisation to MeCab. Two practical consequences worth knowing:

Use IPADIC, not UniDic. The shipped dictionaries (Nakamura + Wang & Isomura) were compiled against the IPADIC POS scheme. UniDic tokenises some compounds differently and won't kanji-normalise kana writings, which reduces match coverage. If you previously installed unidic-lite as a side effect of another package, point MeCab back at IPADIC explicitly:
```
brew install mecab-ipadic
# then either edit /opt/homebrew/etc/mecabrc to set
#     dicdir = /opt/homebrew/lib/mecab/dic/ipadic
# or pass -d per call:
MLAskOfficial(mecab_arg="-d /opt/homebrew/lib/mecab/dic/ipadic")
```
Kana writings of ambiguous verbs may miss. IPADIC's lemma table picks the most frequent reading for a kana writing. たつ in isolation lemmatises to 経つ ("to elapse"), not 立つ ("to stand"), so a kana-only input like 腹たつ won't reach the 腹が立つ dictionary entry even with particle omission. The same input written 腹立つ or 腹が立つ matches cleanly. Robust yomi/N-best parsing for these cases is tracked as IMPROVEMENTS.md §1.1 + §1.2.

Command-line interface

# Single sentence (stdin or --text)
echo "腹が立つ"                  | mlask analyze --format pipe
echo "彼のことは嫌いではない！"   | mlask analyze --format json

# Batch a file
mlask batch -i corpus.txt -o results.csv  --format csv
mlask batch -i corpus.txt -o results.json --format json --parallel
mlask batch -i corpus.txt                 --format pipe > results.txt

# Throughput benchmark
mlask benchmark --sentences 10000
mlask benchmark --sentences 100000 --parallel -j 8

# Mine candidate emotive expressions from a corpus (manual-review TSV)
mlask extract corpus.txt --output candidates.tsv --min-freq 5

All commands accept --backend mecab|fugashi and --mecab-arg "-r /path/to/mecabrc".

Performance

On Apple Silicon (Python 3.14, mecab-python3 + IPADIC):

Workload	Throughput
Cold start (no cache)	~37 ms
Warm start (cache hit)	~17 ms
Single sentence (steady-state)	20 µs median, 46 µs p99
Sequential batch (10,000 sentences)	~50,000 sentences/sec
Multiprocessing batch (10,000 × 4 workers)	~50,000 sentences/sec
Auto-parallel `analyze_batch(50,000)`	~100,000 sentences/sec

See CHANGELOG.md for full benchmark methodology.

Citation

When using ML-Ask in research, please cite both of the following:

Ptaszynski, M., Dybala, P., Rzepka, R., Araki, K., & Masui, F. (2017). ML-Ask: Open source affect analysis software for textual input in Japanese. Journal of Open Research Software, 5(1), 16-16.

@article{ptaszynski2017ml,
  title={ML-Ask: Open source affect analysis software for textual input in Japanese},
  author={Ptaszynski, Michal and Dybala, Pawel and Rzepka, Rafal and Araki, Kenji and Masui, Fumito},
  journal={Journal of Open Research Software},
  volume={5},
  number={1},
  pages={16--16},
  year={2017}
}

Wang, L., Isomura, S., Ptaszynski, M., Dybala, P., Urabe, Y., Rzepka, R., & Masui, F. (2024). The limits of words: expanding a word-based emotion analysis system with multiple emotion dictionaries and the automatic extraction of emotive expressions. Applied Sciences, 14(11), 4439.

@article{wang2024limits,
  title={The limits of words: expanding a word-based emotion analysis system with multiple emotion dictionaries and the automatic extraction of emotive expressions},
  author={Wang, Lu and Isomura, Sho and Ptaszynski, Michal and Dybala, Pawel and Urabe, Yuki and Rzepka, Rafal and Masui, Fumito},
  journal={Applied Sciences},
  volume={14},
  number={11},
  pages={4439},
  year={2024},
  publisher={MDPI}
}

A machine-readable Citation File Format manifest is at CITATION.cff.

Contributing

Issues, pull requests, and dictionary submissions are welcome at https://github.com/ptaszynski/mlask-official. See IMPROVEMENTS.md for the active roadmap; ❮ HIGH PRIORITY ❯ items are the best first contributions.

When opening a PR that touches the emotion dictionaries (mlask_official/emotions/*.txt), please include:

The source / rationale for each entry (paper, corpus reference, or example sentence).
Evidence that the entry doesn't collide with an existing class (mlask analyze --text "<entry>" before and after).
A note in CHANGELOG.md under an [Unreleased] section.

License

BSD 3-Clause — the same licence as the original ML-Ask system.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlask_official-0.5.0.tar.gz (69.6 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlask_official-0.5.0-py3-none-any.whl (71.1 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file mlask_official-0.5.0.tar.gz.

File metadata

Download URL: mlask_official-0.5.0.tar.gz
Upload date: May 18, 2026
Size: 69.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for mlask_official-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`cdafd9a76de62d83e75a42c782e9b03f3b34cffe75d3d7fba803ab5c86f81d28`
MD5	`cb0adfab545710e6d2437bc9d613e877`
BLAKE2b-256	`81621edccdf31e3e7ecfa2b1e3b761bd59571e973c559d2a42fe9131312421cf`

See more details on using hashes here.

File details

Details for the file mlask_official-0.5.0-py3-none-any.whl.

File metadata

Download URL: mlask_official-0.5.0-py3-none-any.whl
Upload date: May 18, 2026
Size: 71.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for mlask_official-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73ed0ca6823f6b57ac3b2ea991de28ed4c138adf6150e8e8551d613e49d9cbf2`
MD5	`9b28c8bba4fe0a711eaf90efc0f40d3e`
BLAKE2b-256	`aa77477e7b0d53ccff5cc425854d5fcd45d1fbaba46936d5930ead83874f3aca`

See more details on using hashes here.

mlask-official 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ML-Ask Official

Quick start

About ML-Ask

Features

Emotion classes

Representative emotion

Installation

Step 1 — Install MeCab + a Japanese dictionary

macOS (Homebrew)

Ubuntu / Debian

Fedora / RHEL

Arch Linux

Windows

Step 2 — Create a Python virtual environment

Step 3 — Install mlask-official

From PyPI

From a source checkout

Step 4 — Verify the installation

Python API

CLI

Streamlit app

Step 5 — Troubleshooting

Usage

Streaming + parallel APIs

Notes on tokenisation

Command-line interface

Performance

Citation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Step 3 — Install `mlask-official`