A package for analyzing hot topics in text data
Project description
Hot Topic Analyzer
Hot Topic Analyzer 是一个用于分析文本数据中热门话题的Python包。它使用先进的自然语言处理技术来识别和总结大量文本中的主要主题。
安装
你可以使用pip安装Hot Topic Analyzer:
pip install hot-topic-analyzer
使用方法
以下是一个基本的使用示例:
from hotopic.hotopic import HoTopic
from hotopic.utils import Config
# 准备输入数据
input_data = [
{"content_id": "1", "content": "这是第一篇关于黄金价格的文章..."},
{"content_id": "2", "content": "这是第二篇关于黄金市场的文章..."},
# ... 更多文章
]
# 创建配置
config = Config(
min_content_length=10,
max_content_length=1000,
similarity_threshold=0.5,
top_topics_count=10,
max_keywords_per_topic=15,
representative_docs_count=3,
doc_preview_length=200,
quality_weights={
"coherence": 0.6,
"distinctiveness": 0.2,
"size_ratio": 0.2
},
seed=42,
output_dir="../output"
)
# 初始化HotTopic
hot_topic = HoTopic(**config.__dict__)
# 运行热门话题分析
topic_metadata, topic_contents = hot_topic.run(input_data)
# 打印结果
for topic in topic_metadata:
print(f"话题ID: {topic['topic_id']}")
print(f"话题标题: {topic['topic_title']}")
print(f"关键词: {', '.join(topic['topic_keywords'])}")
print(f"文档数量: {topic['count']}")
print(f"质量分数: {topic['quality_score']}")
print("代表性文档:")
for doc in topic['representative_docs']:
print(f" - {doc}")
print("\n")
配置参数
你可以通过修改Config对象来自定义以下参数:
- min_content_length: 最小内容长度
- max_content_length: 最大内容长度
- similarity_threshold: 相似度阈值
- top_topics_count: 返回的热门话题数量
- max_keywords_per_topic: 每个话题的最大关键词数量
- representative_docs_count: 每个话题的代表性文档数量
- doc_preview_length: 文档预览长度
- quality_weights: 话题质量评分权重
- seed: 随机种子
- output_dir: 输出目录
许可证
本项目采用 MIT 许可证。详情请参见 LICENSE 文件。
贡献
欢迎贡献!请阅读 CONTRIBUTING.md 了解如何为这个项目做出贡献。
联系方式
如有任何问题或建议,请联系 your.email@example.com。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hotopic-0.1.0.tar.gz.
File metadata
- Download URL: hotopic-0.1.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec19818c2615bf24415f396adc0746209ad02c8853170c4079dc06045f59a590
|
|
| MD5 |
eff9b9d34042d9db63157befa5cd468e
|
|
| BLAKE2b-256 |
e78cc43d08594ba00dc59e70a52f4b092b30d98cf119969a42e1baee7752e64a
|
File details
Details for the file hotopic-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hotopic-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31aa6df5cca3506f6dbfbca728e6b6765567de5597f654b36045839266cd5c53
|
|
| MD5 |
cc69bd861679b2d8411058224cb677dc
|
|
| BLAKE2b-256 |
6ae64f55275ef51b8744d819cfb79da5dc570d22f25b0fb238f1f79c0d2872f6
|