Skip to main content

A package for analyzing hot topics in text data

Project description

Hot Topic Analyzer

Hot Topic Analyzer 是一个用于分析文本数据中热门话题的Python包。它使用先进的自然语言处理技术来识别和总结大量文本中的主要主题。

安装

你可以使用pip安装Hot Topic Analyzer:

pip install hot-topic-analyzer

使用方法

以下是一个基本的使用示例:

from hotopic.hotopic import HoTopic
from hotopic.utils import Config

# 准备输入数据
input_data = [
    {"content_id": "1", "content": "这是第一篇关于黄金价格的文章..."},
    {"content_id": "2", "content": "这是第二篇关于黄金市场的文章..."},
    # ... 更多文章
]

# 创建配置
config = Config(
    min_content_length=10,
    max_content_length=1000,
    similarity_threshold=0.5,
    top_topics_count=10,
    max_keywords_per_topic=15,
    representative_docs_count=3,
    doc_preview_length=200,
    quality_weights={
        "coherence": 0.6,
        "distinctiveness": 0.2,
        "size_ratio": 0.2
    },
    seed=42,
    output_dir="../output"
)

# 初始化HotTopic
hot_topic = HoTopic(**config.__dict__)

# 运行热门话题分析
topic_metadata, topic_contents = hot_topic.run(input_data)

# 打印结果
for topic in topic_metadata:
    print(f"话题ID: {topic['topic_id']}")
    print(f"话题标题: {topic['topic_title']}")
    print(f"关键词: {', '.join(topic['topic_keywords'])}")
    print(f"文档数量: {topic['count']}")
    print(f"质量分数: {topic['quality_score']}")
    print("代表性文档:")
    for doc in topic['representative_docs']:
        print(f"  - {doc}")
    print("\n")

配置参数

你可以通过修改Config对象来自定义以下参数:

  • min_content_length: 最小内容长度
  • max_content_length: 最大内容长度
  • similarity_threshold: 相似度阈值
  • top_topics_count: 返回的热门话题数量
  • max_keywords_per_topic: 每个话题的最大关键词数量
  • representative_docs_count: 每个话题的代表性文档数量
  • doc_preview_length: 文档预览长度
  • quality_weights: 话题质量评分权重
  • seed: 随机种子
  • output_dir: 输出目录

许可证

本项目采用 MIT 许可证。详情请参见 LICENSE 文件。

贡献

欢迎贡献!请阅读 CONTRIBUTING.md 了解如何为这个项目做出贡献。

联系方式

如有任何问题或建议,请联系 your.email@example.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hotopic-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hotopic-0.1.0-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file hotopic-0.1.0.tar.gz.

File metadata

  • Download URL: hotopic-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for hotopic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ec19818c2615bf24415f396adc0746209ad02c8853170c4079dc06045f59a590
MD5 eff9b9d34042d9db63157befa5cd468e
BLAKE2b-256 e78cc43d08594ba00dc59e70a52f4b092b30d98cf119969a42e1baee7752e64a

See more details on using hashes here.

File details

Details for the file hotopic-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hotopic-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for hotopic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31aa6df5cca3506f6dbfbca728e6b6765567de5597f654b36045839266cd5c53
MD5 cc69bd861679b2d8411058224cb677dc
BLAKE2b-256 6ae64f55275ef51b8744d819cfb79da5dc570d22f25b0fb238f1f79c0d2872f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page