Skip to main content

蓝鲸数据挖掘软件包的文本分析附加组件。

Project description

Textable is an open source add-on bringing advanced text-analytical functionalities to the Orange Canvas data mining software package (itself open source). Look at the following example to see it in typical action.

The project’s website is http://textable.io. It hosts a repository of recipes to help you get started with Textable.

Documentation is hosted at http://orange3-textable.readthedocs.io/ and you can get further support at https://textable.freshdesk.com/ or by e-mail to support@textable.io

Orange Textable was designed and implemented by LangTech Sarl on behalf of the department of language and information sciences (SLI) at the University of Lausanne (see Credits and How to cite Orange Textable).

Features

Basic text analysis

  • use regular expressions to segment letters, words, sentences, etc. or full-text query

  • use regexes to extract annotations from many input formats

  • import in-line XML markup (e.g. TEI)

  • include/exclude segments based on user-defined lists (stoplists)

  • filter segments based on frequency

  • easily generate random text samples

Advanced text analysis

  • concordances and collocations, also based on annotations

  • segment distribution, document-term matrix, transition matrix, etc.

  • co-occurrence tables, also between different types of segments

  • lemmatization and POS-tagging via Treetagger

  • robust linguistic complexity measures, incl. mean length of word, lexical diversity, etc.

  • many advanced data mining algorithms: clustering, classification, factor analyses, etc.

Text recoding

  • Unicode-aware preprocessing functions, e.g. remove accents from Ancient Greek text

  • recode and restructure texts using regexes, e.g. rewrite CSV as XML

Extensibility

  • handles hundreds of text files

  • use Python script for custom text processing or to access external tools: NLTK, Pattern, GenSim, etc.

Interoperability

  • import text from keyboard, files, or URLs

  • process any kind of raw text format: TXT, HTML, XML, CSV, etc.

  • supports many text encodings, incl. Unicode

  • export results in text files or copy-paste

  • easy interfacing with Orange’s Text Mining add-on

Ease of access

  • user-friendly visual interface

  • ready-made recipes for a range of frequent use cases

  • extensive documentation

  • support and community forums

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blauwal3_textable-3.1.11.tar.gz (9.4 MB view details)

Uploaded Source

Built Distribution

Blauwal3_Textable-3.1.11-py3-none-any.whl (215.9 kB view details)

Uploaded Python 3

File details

Details for the file blauwal3_textable-3.1.11.tar.gz.

File metadata

  • Download URL: blauwal3_textable-3.1.11.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for blauwal3_textable-3.1.11.tar.gz
Algorithm Hash digest
SHA256 d611a7139593e00cbe22999f79885880a7aefbe99c4fc393c0ea0f2e8131f528
MD5 af101550641b8289c88b493a70effa8e
BLAKE2b-256 68bf14d4e6d44e7692d03a626d4d1cf671414b20bb2a16b87f6ac193ba04a18e

See more details on using hashes here.

File details

Details for the file Blauwal3_Textable-3.1.11-py3-none-any.whl.

File metadata

File hashes

Hashes for Blauwal3_Textable-3.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 9ecd703299634c43194dafe9139d2caf791709170d26f0625f06909a70a15ed8
MD5 f588e262109c1d1741ef6e6eb6d9dc4f
BLAKE2b-256 25e73b452ec6c28a777de42b8cde1e2c58c92c342b7c173e26df292ce0f752a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page