中文新闻词频分析与趋势词云工具

Reason this release was yanked:

API发生重大变化

Project description

新闻词频分析与趋势词可视化

功能

中文新闻标题/正文的 TF-IDF 高频词提取
基于 TextRank 的关键词提取
词频统计
按 时间窗口生成趋势词云
支持自定义停用词表，过滤中文虚词
可直接通过命令行工具 wordfreq-cn 运行

安装

# 安装 Python 依赖
pip install jieba scikit-learn wordcloud matplotlib

# 安装本地包（如果使用源代码）
pip install .

# 在线安装
pip install wordfreq-cn

使用方法

1. 命令行运行

  wordfreq-cn --news "人工智能技术在医疗领域的应用取得突破" "全球气候变化加剧" --topk 5

--news：新闻标题或正文列表，可传多个
--topk：输出前 N 个关键词（默认 10）
会在 wordclouds/ 生成每条新闻或按日期聚合的趋势词云图片

示例输出：

=== TF-IDF 高权重词 ===
人工智能技术 1.0000
医疗 0.8349
应用 0.6730
...

=== TextRank 关键词 ===
标题: 人工智能技术在医疗领域的应用取得突破
  领域 (1.0000)
  医疗 (0.8349)
  取得 (0.6746)
  应用 (0.6730)
  突破 (0.5175)

=== 词频统计 ===
技术 2
人工智能 1
医疗 1
...

2. Python 调用

from wordfreq_cn import tfidf_keywords, textrank_keywords, count_words, generate_trend_wordcloud, load_stopwords

news_list = [
    ("2025-11-25", "人工智能技术在医疗领域的应用取得突破"),
    ("2025-11-25", "全球气候变化加剧，联合国发布最新报告")
]

stopwords = load_stopwords(custom_file="stopwords.txt")

# TF-IDF
tfidf_res = tfidf_keywords([text for _, text in news_list], top_k=5, stopwords=stopwords)
print(tfidf_res)

# 词频统计
counter = count_words([text for _, text in news_list], stopwords=stopwords)
print(counter)

# 按日期生成词云
from collections import defaultdict

news_by_date = defaultdict(list)
for date, text in news_list:
    news_by_date[date].append(text)
generate_trend_wordcloud(news_by_date, stopwords=stopwords)

词云示例

2015-11-25 2015-11-26

测试

# 运行所有测试
pytest

# 运行特定测试文件
pytest tests/test_core.py -v

# 运行特定测试类
pytest tests/test_core.py::TestTFIDFKeywords -v

# 带覆盖率报告
pytest --cov=wordfreq_cn

# 生成 HTML 覆盖率报告
pytest --cov=wordfreq_cn --cov-report=html

文件说明

文件名	说明
`wordfreq_cn/`	Python 包目录，包含核心逻辑和 CLI
`wordfreq_cn/data/stopwords.txt`	可选自定义停用词文件
`wordfreq_cn/data/cn_stopwords.txt`	哈工大中文停用词表（脚本可自动加载）
`wordfreq_cn/data/fonts/SourceHanSansHWSC-VF.ttf`	《思源黑体》中文字体文件，用于生成中文词云
`wordclouds/`	存放生成的词云图片
`tests/`	单元测试代码

注意事项

如果新闻量大，可在 tfidf_keywords 函数中调整 max_features 和 top_k 参数。
建议停用词表包含常用虚词（如“的”“在”“是”）以获得更干净的词频统计结果。
安装后，可以直接使用 wordfreq-cn 命令，无需运行 python main.py或者python wordfreq_cn/cli.py 之类的命令使用。

Project details

Release history Release notifications | RSS feed

0.2.1

Apr 25, 2026

0.2.0

Apr 19, 2026

0.1.10

Dec 12, 2025

0.1.8

Dec 7, 2025

0.1.7

Dec 7, 2025

0.1.6

Dec 2, 2025

0.1.4

Nov 30, 2025

0.1.3

Nov 30, 2025

0.1.2 yanked

Nov 29, 2025

Reason this release was yanked:

API发生重大变化,移除了textRank的API

0.1.1 yanked

Nov 29, 2025

Reason this release was yanked:

存在重大的bug

This version

0.0.4 yanked

Nov 28, 2025

Reason this release was yanked:

API发生重大变化

0.0.2 yanked

Nov 28, 2025

Reason this release was yanked:

API发生重大变化

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordfreq_cn-0.0.4.tar.gz (19.0 MB view details)

Uploaded Nov 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wordfreq_cn-0.0.4-py3-none-any.whl (19.1 MB view details)

Uploaded Nov 28, 2025 Python 3

File details

Details for the file wordfreq_cn-0.0.4.tar.gz.

File metadata

Download URL: wordfreq_cn-0.0.4.tar.gz
Upload date: Nov 28, 2025
Size: 19.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for wordfreq_cn-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`8404881cd0b1df3f6ac7c752d2a7a4d0c71712c9bfb4a0c5f10ded21c8114927`
MD5	`56816a545269665adb935cc2a22e2b29`
BLAKE2b-256	`8e2c70a942ad6699e3c5fe54404a52d301e76f1385d8ef1aee8ab3a61d0edc79`

See more details on using hashes here.

File details

Details for the file wordfreq_cn-0.0.4-py3-none-any.whl.

File metadata

Download URL: wordfreq_cn-0.0.4-py3-none-any.whl
Upload date: Nov 28, 2025
Size: 19.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for wordfreq_cn-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87af6531a6ba9693b850c04b2230e286affef0598cd02ddf3b07984cd88c5c76`
MD5	`6b488cb65f7aca300e4d8a86a82680fa`
BLAKE2b-256	`9dd6e511dcb5082881f966cd7042d9b9338576e72626a815ccd9687f9c6a2f48`

See more details on using hashes here.

wordfreq-cn 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

新闻词频分析与趋势词可视化

功能

安装

使用方法

1. 命令行运行

2. Python 调用

测试

文件说明

注意事项

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes