蓝鲸数据挖掘软件包的文本分析附加组件。
Project description
Textable is an open source add-on bringing advanced text-analytical functionalities to the Orange Canvas data mining software package (itself open source). Look at the following example to see it in typical action.
The project’s website is http://textable.io. It hosts a repository of recipes to help you get started with Textable.
Documentation is hosted at http://orange3-textable.readthedocs.io/ and you can get further support at https://textable.freshdesk.com/ or by e-mail to support@textable.io
Orange Textable was designed and implemented by LangTech Sarl on behalf of the department of language and information sciences (SLI) at the University of Lausanne (see Credits and How to cite Orange Textable).
Features
Basic text analysis
use regular expressions to segment letters, words, sentences, etc. or full-text query
use regexes to extract annotations from many input formats
import in-line XML markup (e.g. TEI)
include/exclude segments based on user-defined lists (stoplists)
filter segments based on frequency
easily generate random text samples
Advanced text analysis
concordances and collocations, also based on annotations
segment distribution, document-term matrix, transition matrix, etc.
co-occurrence tables, also between different types of segments
lemmatization and POS-tagging via Treetagger
robust linguistic complexity measures, incl. mean length of word, lexical diversity, etc.
many advanced data mining algorithms: clustering, classification, factor analyses, etc.
Text recoding
Unicode-aware preprocessing functions, e.g. remove accents from Ancient Greek text
recode and restructure texts using regexes, e.g. rewrite CSV as XML
Extensibility
handles hundreds of text files
use Python script for custom text processing or to access external tools: NLTK, Pattern, GenSim, etc.
Interoperability
import text from keyboard, files, or URLs
process any kind of raw text format: TXT, HTML, XML, CSV, etc.
supports many text encodings, incl. Unicode
export results in text files or copy-paste
easy interfacing with Orange’s Text Mining add-on
Ease of access
user-friendly visual interface
ready-made recipes for a range of frequent use cases
extensive documentation
support and community forums
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file blauwal3_textable-3.1.11.tar.gz
.
File metadata
- Download URL: blauwal3_textable-3.1.11.tar.gz
- Upload date:
- Size: 9.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d611a7139593e00cbe22999f79885880a7aefbe99c4fc393c0ea0f2e8131f528 |
|
MD5 | af101550641b8289c88b493a70effa8e |
|
BLAKE2b-256 | 68bf14d4e6d44e7692d03a626d4d1cf671414b20bb2a16b87f6ac193ba04a18e |
File details
Details for the file Blauwal3_Textable-3.1.11-py3-none-any.whl
.
File metadata
- Download URL: Blauwal3_Textable-3.1.11-py3-none-any.whl
- Upload date:
- Size: 215.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ecd703299634c43194dafe9139d2caf791709170d26f0625f06909a70a15ed8 |
|
MD5 | f588e262109c1d1741ef6e6eb6d9dc4f |
|
BLAKE2b-256 | 25e73b452ec6c28a777de42b8cde1e2c58c92c342b7c173e26df292ce0f752a7 |