Simple text analysis from the command line
Project description
# textkit
Command line tools for text processing and analysis.
## About
`textkit` is a series of small, unix-style tools that provide a suite of capabilities for
dealing with text as data.
Think of textkit as basic natural language processing capabilities - from the command line.
## textkit Features
Here are some of the cool things you can do with textkit.
Convert a document to a set of word tokens and remove all punctuation from the tokens:
```
textkit text2words input.txt | textkit filterpunc -
```
Count top used words in a text:
```
textkit text2words alice.txt | textkit count - | head
```
Do the same, but with punctuation removed:
```
textkit text2words alice.txt | textkit filterpunc - | textkit count - | head
```
## Installation
To test locally, clone the repo:
```
git clone git@github.com:learntextvis/textkit.git
```
Create a local virtual environment or `conda` environment.
Here is how I created my local `conda` environment for installing and testing textkit:
```
conda create -name textkit nltk
source activate textkit
```
Then I went into the `textkit` directory to install its requirements
```
cd textkit
pip install -r requirements.txt
```
Finally, I installed the local version of textkit using the `--editable` flag:
```
pip install --editable .
```
_In the future basic installation instructions will be just the following:_
textkit is available via `pip`
```
pip install textkit
```
## Usage
textkit is an umbrella of commands all under the `textkit` command line tool.
```
textkit --help
```
Will show all the commands available. You can also get help on a particular command to see its arguments and options.
```
textkit text2words --help
```
Chaining of textkit commands is possible (and encouraged) with unix pipes (`|`). Use the dash `-` to indicate the output of a previous command should be used as input for the next command.
Command line tools for text processing and analysis.
## About
`textkit` is a series of small, unix-style tools that provide a suite of capabilities for
dealing with text as data.
Think of textkit as basic natural language processing capabilities - from the command line.
## textkit Features
Here are some of the cool things you can do with textkit.
Convert a document to a set of word tokens and remove all punctuation from the tokens:
```
textkit text2words input.txt | textkit filterpunc -
```
Count top used words in a text:
```
textkit text2words alice.txt | textkit count - | head
```
Do the same, but with punctuation removed:
```
textkit text2words alice.txt | textkit filterpunc - | textkit count - | head
```
## Installation
To test locally, clone the repo:
```
git clone git@github.com:learntextvis/textkit.git
```
Create a local virtual environment or `conda` environment.
Here is how I created my local `conda` environment for installing and testing textkit:
```
conda create -name textkit nltk
source activate textkit
```
Then I went into the `textkit` directory to install its requirements
```
cd textkit
pip install -r requirements.txt
```
Finally, I installed the local version of textkit using the `--editable` flag:
```
pip install --editable .
```
_In the future basic installation instructions will be just the following:_
textkit is available via `pip`
```
pip install textkit
```
## Usage
textkit is an umbrella of commands all under the `textkit` command line tool.
```
textkit --help
```
Will show all the commands available. You can also get help on a particular command to see its arguments and options.
```
textkit text2words --help
```
Chaining of textkit commands is possible (and encouraged) with unix pipes (`|`). Use the dash `-` to indicate the output of a previous command should be used as input for the next command.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textkit-0.0.2.tar.gz
(3.7 kB
view details)
Built Distribution
File details
Details for the file textkit-0.0.2.tar.gz
.
File metadata
- Download URL: textkit-0.0.2.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3b573c685a34c238a9d22146e6799fe4fe96a740d14ba1df9565cb7156ff7d9 |
|
MD5 | 84d50d6301f73d7a1ff7627a228261e2 |
|
BLAKE2b-256 | 24bc61d4f2ab6f5ea2afc3ee2c1fa23d847b6c6a563f9479c799205c4e9cced1 |
File details
Details for the file textkit-0.0.2-py2.py3-none-any.whl
.
File metadata
- Download URL: textkit-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88b40310426f132a9131259c434f2237c8090e3c222c562a98d18bcd59a6a4f5 |
|
MD5 | cd3214920d8748eb552740f8f07a3dd2 |
|
BLAKE2b-256 | 38d6fe769b51e598078e4c98d3745fd377f4a00ab63d6df1919889751cdb2a35 |