文本分析库，可对文本进行词频统计、词典扩充、情绪分析等

These details have not been verified by PyPI

Project links

Homepage

Project description

textana4sc

中文文本分析库，可对文本进行词频统计、情绪分析、话题分析等

github地址 https://github.com/hidadeng/cntext
pypi地址 https://pypi.org/project/cntext/

功能模块含

word_cloud 文本统计,可读性等
get_keyword 获取文本关键词
get_entity 获取文本实体
get_emotion 获取文本情绪
get_cosemantic 获取词语共现语义图
get_topic 获取话题
visualization 可视化，如词云图

安装

pip install textanalyze4sc

一、读取数据

from texttool import analyze

df_data = analyze.load_data(the path of your data)

二、提取关键词

df_data_key=analyze.get_keyword(df_data)

三、提取实体

df_data_entity=analyze.get_entity(df_data)

四、情感分析

这里提供两种粒度的情感分析。

1，这里分为三种“积极”，“负面”，“中立”

analyze.get_emotion('我很开心，你是这么认为的吗')

结果

'pos'

2，这里进行更为细粒度的区分，分为“好”，“乐”，“哀”，“怒”，“惧”，“恶”，“惊” 七类情绪。

analyze.get_emotion_sp('我很开心，你是这么认为的吗')

结果

{'words': 10,
 'sentences': 1,
 '好': 0,
 '乐': 1,
 '哀': 0,
 '怒': 0,
 '惧': 0,
 '恶': 0,
 '惊': 0}

五、词语共现图

本文使用筛选出现频率出现前50的实体，并作出共现图


analyze.get_cosemantic(df_data,top50_all)

六、抽取三元组

text = "他叫汤姆去拿外衣。"
get_graph(text)

结果

[['他', '叫', '汤姆'], ['汤姆', '拿', '外衣']]

七、生成摘要

本文应用抽取式摘要技术，可以设置sent_num参数控制输出摘要局数。

text='2013年，信号与信息处理专业硕士毕业的张超凡进入国铁南宁局南宁电务段工作。那一年，广西同时开通多条高铁线路，高铁营业里程从0公里跃升至1000多公里。10年间，伴随着中国铁路高速发展，张超凡收获颇多。'
get_summary(text,sent_num=1)

结果

'2013年，信号与信息处理专业硕士毕业的张超凡进入国铁南宁局南宁电务段工作。'

八、可视化

本文提供各类可视化工具，柱状图，趋势图，词云图等。

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4

Mar 23, 2023

0.3

Feb 1, 2023

0.2

Feb 1, 2023

0.1

Feb 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textana4sc-0.4.tar.gz (10.4 kB view details)

Uploaded Mar 23, 2023 Source

File details

Details for the file textana4sc-0.4.tar.gz.

File metadata

Download URL: textana4sc-0.4.tar.gz
Upload date: Mar 23, 2023
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.4

File hashes

Hashes for textana4sc-0.4.tar.gz
Algorithm	Hash digest
SHA256	`887c2e6da007f24643c98853ce232c0bdf5ee5906e6b9a246c5620e1d848a82a`
MD5	`b7997b6ffe4e91f87417f0270780d04f`
BLAKE2b-256	`60a7539900615c5bb59284f967fd26d14a9cc964c3dcf81e9f2978606c977e34`

See more details on using hashes here.

textana4sc 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

textana4sc

安装

一、读取数据

二、提取关键词

三、提取实体

四、情感分析

五、词语共现图

六、抽取三元组

七、生成摘要

八、可视化

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes