Skip to main content

文本分析停用词表,支持中英德法等15种语言。

Project description

一、multistop

停用词表, 同时支持中英德等15种语言。



二、安装

pip3 install multistop

三、使用

初始化停用词类

from multistop import Stopwords
#默认选取的中文lang='chinese'
sw = Stopwords()

查看支持的语言

sw.languages()

Run

dict_keys(['dutch', 'german', 'hungarian', 'turkish', 'russian', 'italian', 'english', 'norwegian', 'portuguese', 'finnish', 'danish', 'french', 'swedish', 'spanish', 'chinese'])


选择某种语言的停用词表

sw.setlang(lang='chinese')

Run

set language to chinese

词表长度

sw.size()

Run

778

查看停用词表是否含有某词

sw.contains('的')

Run

True

添加新停用词

sw.add('6啊')
sw.size()

Run

779

将停用词表下载下来

sw.download('chinese.txt')


如果

如果您是经管人文社科专业背景,编程小白,面临海量文本数据采集和处理分析艰巨任务,可以参看《python网络爬虫与文本数据分析》视频课。作为文科生,一样也是从两眼一抹黑开始,这门课程是用五年时间凝缩出来的。自认为讲的很通俗易懂o( ̄︶ ̄)o,

  • python入门
  • 网络爬虫
  • 数据读取
  • 文本分析入门
  • 机器学习与文本分析
  • 文本分析在经管研究中的应用

感兴趣的童鞋不妨 戳一下《python网络爬虫与文本数据分析》进来看看~

更多


Project details


Release history Release notifications | RSS feed

This version

1.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

multistop-1.3-py3-none-any.whl (22.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page