Skip to main content

基于bertopic对中文文档进行主题建模

Project description

About

一个基于bertopic对中文文档进行主题建模的包。

Install

$ pip3 install -U bertopic_base_chinese

Director

  • bertopic_base_chinese
    • _model.py

_model.py

  • BERTopic类 重写了__init__(),设置embedding_model为"paraphrase-multilingual-MiniLM-L12-v2",以及选取tokenizer为jieba.lcut,初始化类参数。

Usage

from bertopic_base_chinese import BERTopic

docs = ["我爱北京天安门", "我家大门常打开,开放怀抱等你"]
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)

Contact us

may.xiaoya.zhang@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bertopic_base_chinese-0.0.1.tar.gz (2.3 kB view details)

Uploaded Source

File details

Details for the file bertopic_base_chinese-0.0.1.tar.gz.

File metadata

  • Download URL: bertopic_base_chinese-0.0.1.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for bertopic_base_chinese-0.0.1.tar.gz
Algorithm Hash digest
SHA256 495acdc81a31a4c4e6dd361538221bb6bc2d2adaf6e3348f09dc1071fff8c26c
MD5 f235e4785f9e7606b91a2219e3d02259
BLAKE2b-256 19a967f3cabab763db3290752e01230f8457faebc298cb70038a7fac7423b9f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page