Unix inspired utilities for Natural Language Processing
Project description
# nlptools
*nlptools* are an Unix inspired utilities for Natural Language Processing.
- [Examples](#examples)
- [Basic tokenization](#basic-tokenization)
- [Lemmatization and Stemming](#lemmatization-and-stemming)
- [Basic features](#basic-features)
- [Data fetching](#data-fetching)
- [Documentation](#documentation)
- [Commands](#commands)
- [Cookbook](#cookbook)
## Examples
### Basic tokenization
```bash
echo "Hello world!" | proc-tkn
# Result:
#
# Hello
# world
# !
```
### Lemmatization and Stemming
```bash
echo "dog dogs" | bin/proc-tkn | bin/proc-lmtz
# Result:
#
# dog
# dog
```
### Basic features
```bash
echo "Roses are red. Violets are blue. Sugar is sweet. And so are you." \
| proc-tkn \
| proc-lmtz \
| feat-tfidf --type idf --sep .
# Result
#
# and 0.0 0.0 0.0 1.0
# are 0.4444444444444444 0.4444444444444444 0.0 0.3333333333333333
# blue 0.0 1.3333333333333333 0.0 0.0
# is 0.0 0.0 1.3333333333333333 0.0
# red 1.3333333333333333 0.0 0.0 0.0
# roses 1.3333333333333333 0.0 0.0 0.0
# so 0.0 0.0 0.0 1.0
# sugar 0.0 0.0 1.3333333333333333 0.0
# sweet 0.0 0.0 1.3333333333333333 0.0
# violets 0.0 1.3333333333333333 0.0 0.0
# you 0.0 0.0 0.0 1.0
```
### Data fetching
```bash
fetch-url https://www.gutenberg.org/files/996/old/1donq10.txt \
| proc-tkn \
| sort \
| uniq -c
# Result
#
# 5312 ``
# 1 ~
# 1 <
# 1 =
# 1 >
# 1 _
# 40 -
# 36793 ,
# 6057 ;
# 370 :
# 677 !
# 970 ?
# 1 /
# 7723 .
# 1 ...
# 503 '
# 1 '-
# 5222 ''
# 209 (
# 209 )
# 20 [
# 20 ]
# .....
```
## Documentation
### [Commands](doc/commands.md)
### [CookBook](doc/cookbook.md)
*nlptools* are an Unix inspired utilities for Natural Language Processing.
- [Examples](#examples)
- [Basic tokenization](#basic-tokenization)
- [Lemmatization and Stemming](#lemmatization-and-stemming)
- [Basic features](#basic-features)
- [Data fetching](#data-fetching)
- [Documentation](#documentation)
- [Commands](#commands)
- [Cookbook](#cookbook)
## Examples
### Basic tokenization
```bash
echo "Hello world!" | proc-tkn
# Result:
#
# Hello
# world
# !
```
### Lemmatization and Stemming
```bash
echo "dog dogs" | bin/proc-tkn | bin/proc-lmtz
# Result:
#
# dog
# dog
```
### Basic features
```bash
echo "Roses are red. Violets are blue. Sugar is sweet. And so are you." \
| proc-tkn \
| proc-lmtz \
| feat-tfidf --type idf --sep .
# Result
#
# and 0.0 0.0 0.0 1.0
# are 0.4444444444444444 0.4444444444444444 0.0 0.3333333333333333
# blue 0.0 1.3333333333333333 0.0 0.0
# is 0.0 0.0 1.3333333333333333 0.0
# red 1.3333333333333333 0.0 0.0 0.0
# roses 1.3333333333333333 0.0 0.0 0.0
# so 0.0 0.0 0.0 1.0
# sugar 0.0 0.0 1.3333333333333333 0.0
# sweet 0.0 0.0 1.3333333333333333 0.0
# violets 0.0 1.3333333333333333 0.0 0.0
# you 0.0 0.0 0.0 1.0
```
### Data fetching
```bash
fetch-url https://www.gutenberg.org/files/996/old/1donq10.txt \
| proc-tkn \
| sort \
| uniq -c
# Result
#
# 5312 ``
# 1 ~
# 1 <
# 1 =
# 1 >
# 1 _
# 40 -
# 36793 ,
# 6057 ;
# 370 :
# 677 !
# 970 ?
# 1 /
# 7723 .
# 1 ...
# 503 '
# 1 '-
# 5222 ''
# 209 (
# 209 )
# 20 [
# 20 ]
# .....
```
## Documentation
### [Commands](doc/commands.md)
### [CookBook](doc/cookbook.md)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlptools-0.5.1.tar.gz
(7.4 kB
view details)
File details
Details for the file nlptools-0.5.1.tar.gz
.
File metadata
- Download URL: nlptools-0.5.1.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c2de873634566fea1da0a2e3ebf01cf6b54725e784954cb78a961d5246b8a93 |
|
MD5 | b865a8001ab3f541de6991b2ecad3129 |
|
BLAKE2b-256 | a1a762a8f60e5c3fb4f2708c63d51ed74b9f56b93fde3c3151a1d48f2ccc7034 |