文本分析停用词表,支持中英德法等15种语言。
Project description
一、multistop
停用词表, 同时支持中英德等15种语言。
二、安装
pip3 install multistop
三、使用
初始化停用词类
from multistop import Stopwords
#默认选取的中文lang='chinese'
sw = Stopwords()
查看支持的语言
sw.languages()
Run
dict_keys(['dutch', 'german', 'hungarian', 'turkish', 'russian', 'italian', 'english', 'norwegian', 'portuguese', 'finnish', 'danish', 'french', 'swedish', 'spanish', 'chinese'])
选择某种语言的停用词表
sw.setlang(lang='chinese')
Run
set language to chinese
词表长度
sw.size()
Run
778
查看停用词表是否含有某词
sw.contains('的')
Run
True
添加新停用词
sw.add('6啊')
sw.size()
Run
779
将停用词表下载下来
sw.download('chinese.txt')
如果
如果您是经管人文社科专业背景,编程小白,面临海量文本数据采集和处理分析艰巨任务,可以参看《python网络爬虫与文本数据分析》视频课。作为文科生,一样也是从两眼一抹黑开始,这门课程是用五年时间凝缩出来的。自认为讲的很通俗易懂o( ̄︶ ̄)o,
- python入门
- 网络爬虫
- 数据读取
- 文本分析入门
- 机器学习与文本分析
- 文本分析在经管研究中的应用
感兴趣的童鞋不妨 戳一下《python网络爬虫与文本数据分析》进来看看~
更多
-
公众号:大邓和他的python
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
multistop-1.3-py3-none-any.whl
(22.7 kB
view details)
File details
Details for the file multistop-1.3-py3-none-any.whl
.
File metadata
- Download URL: multistop-1.3-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.62.3 importlib-metadata/4.8.1 keyring/23.1.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 541898d1af451c20fbfc9e84be3e7f6755f899f18b25a3e70c72ecfdfa198626 |
|
MD5 | 899ae67ce5fe2b62d4f9a0e5d132dc37 |
|
BLAKE2b-256 | 2477af8e5ca3db5867f1d6319111c5ea77b069638e000f0ef7c16568065b3ab5 |