Skip to main content

Textable add-on for Orange 3 data mining software package.

Project description

Textable is an open source add-on bringing advanced text-analytical functionalities to the Orange Canvas data mining software package (itself open source). Look at the following example to see it in typical action.

The project’s website is http://textable.io. It hosts a repository of recipes to help you get started with Textable.

Documentation is hosted at http://orange3-textable.readthedocs.io/ and you can get further support at https://textable.freshdesk.com/ or by e-mail to support@textable.io

Orange Textable was designed and implemented by LangTech Sarl on behalf of the department of language and information sciences (SLI) at the University of Lausanne (see Credits and How to cite Orange Textable).

Features

Basic text analysis

  • use regular expressions to segment letters, words, sentences, etc. or full-text query

  • use regexes to extract annotations from many input formats

  • import in-line XML markup (e.g. TEI)

  • include/exclude segments based on user-defined lists (stoplists)

  • filter segments based on frequency

  • easily generate random text samples

Quantitative text analysis

  • concordances and collocations, also based on annotations

  • segment distribution, document-term matrix, transition matrix, etc.

  • co-occurrence tables, also between different types of segments

  • robust linguistic complexity measures, incl. mean length of word, lexical diversity, etc.

  • many advanced data mining algorithms: clustering, classification, factor analyses, etc.

Text recoding

  • Unicode-aware preprocessing functions, e.g. remove accents from Ancient Greek text

  • recode and restructure texts using regexes, e.g. rewrite CSV as XML

Extensibility

  • handles hundreds of text files

  • use Python script for custom text processing or to access external tools: NLTK, Pattern, GenSim, etc.

Interoperability

  • import text from keyboard, files, or URLs

  • process any kind of raw text format: TXT, HTML, XML, CSV, etc.

  • supports many text encodings, incl. Unicode

  • export results in text files or copy–paste

Ease of access

  • user-friendly visual interface

  • ready-made recipes for a range of frequent use cases

  • extensive documentation

  • support and community forums

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Orange3-Textable-3.0a5.tar.gz (130.2 kB view details)

Uploaded Source

Built Distribution

Orange3_Textable-3.0a5-py3-none-any.whl (173.6 kB view details)

Uploaded Python 3

File details

Details for the file Orange3-Textable-3.0a5.tar.gz.

File metadata

File hashes

Hashes for Orange3-Textable-3.0a5.tar.gz
Algorithm Hash digest
SHA256 4e746451241a0b0daa8305109002f21fb8560b182bc5843f739327a80ec1c77d
MD5 1a8dbb23bfb76bc6cc3493d4e03d62fc
BLAKE2b-256 af491a5f515807df37fd6f290c8adffd8602af336ef45143d5e944ebff1c4192

See more details on using hashes here.

File details

Details for the file Orange3_Textable-3.0a5-py3-none-any.whl.

File metadata

File hashes

Hashes for Orange3_Textable-3.0a5-py3-none-any.whl
Algorithm Hash digest
SHA256 4fa8dbd353cb424c6fd30c208550bbf5ff2e44e2ab87a94c3c5de488a3e3c29f
MD5 487bc1c1259c883a3b2e0c6312fed742
BLAKE2b-256 00515195d50a167804b7c3a813394b9b494d9deb71c8c7a43dbe822e519e675b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page