Skip to main content

Simple text analysis from the command line

Project description

# textkit

Command line tools for text processing and analysis.

## About

`textkit` is a series of small, unix-style tools that provide a suite of capabilities for
dealing with text as data.

Think of textkit as basic natural language processing capabilities - from the command line.

## textkit Features

Here are some of the cool things you can do with textkit.

Convert a document to a set of word tokens and remove all punctuation from the tokens:

```
textkit text2words input.txt | textkit filterpunc -
```

Count top used words in a text:

```
textkit text2words alice.txt | textkit count - | head
```

Do the same, but with punctuation removed:

```
textkit text2words alice.txt | textkit filterpunc - | textkit count - | head
```

## Installation

To test locally, clone the repo:

```
git clone git@github.com:learntextvis/textkit.git
```

Create a local virtual environment or `conda` environment.

Here is how I created my local `conda` environment for installing and testing textkit:

```
conda create -name textkit nltk

source activate textkit
```

Then I went into the `textkit` directory to install its requirements

```
cd textkit

pip install -r requirements.txt
```

Finally, I installed the local version of textkit using the `--editable` flag:

```
pip install --editable .
```

_In the future basic installation instructions will be just the following:_

textkit is available via `pip`

```
pip install textkit
```

## Usage

textkit is an umbrella of commands all under the `textkit` command line tool.

```
textkit --help
```

Will show all the commands available. You can also get help on a particular command to see its arguments and options.

```
textkit text2words --help
```

Chaining of textkit commands is possible (and encouraged) with unix pipes (`|`). Use the dash `-` to indicate the output of a previous command should be used as input for the next command.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textkit-0.0.2.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

textkit-0.0.2-py2.py3-none-any.whl (6.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file textkit-0.0.2.tar.gz.

File metadata

  • Download URL: textkit-0.0.2.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textkit-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a3b573c685a34c238a9d22146e6799fe4fe96a740d14ba1df9565cb7156ff7d9
MD5 84d50d6301f73d7a1ff7627a228261e2
BLAKE2b-256 24bc61d4f2ab6f5ea2afc3ee2c1fa23d847b6c6a563f9479c799205c4e9cced1

See more details on using hashes here.

File details

Details for the file textkit-0.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for textkit-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 88b40310426f132a9131259c434f2237c8090e3c222c562a98d18bcd59a6a4f5
MD5 cd3214920d8748eb552740f8f07a3dd2
BLAKE2b-256 38d6fe769b51e598078e4c98d3745fd377f4a00ab63d6df1919889751cdb2a35

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page