Word and sentence tokenization.
Project description
Ciseau
------
Word and sentence tokenization in Python.
[](https://badge.fury.io/py/ciseau)
[](https://travis-ci.org/JonathanRaiman/ciseau)

[](LICENSE.md)
Usage
-----
Use this package to split up strings according to sentence and word boundaries.
For instance, to simply break up strings into tokens:
```
tokenize("Joey was a great sailor.")
#=> ["Joey ", "was ", "a ", "great ", "sailor ", "."]
```
To also detect sentence boundaries:
```
sent_tokenize("Cat sat mat. Cat's named Cool.", keep_whitespace=True)
#=> [["Cat ", "sat ", "mat", ". "], ["Cat ", "'s ", "named ", "Cool", "."]]
```
`sent_tokenize` can keep the whitespace as-is with the flags `keep_whitespace=True` and `normalize_ascii=False`.
Installation
------------
```
pip3 install ciseau
```
Testing
-------
Run `nose2`.
If you find this project useful for your work or research, here's how you can cite it:
```latex
@misc{RaimanCiseau2017,
author = {Raiman, Jonathan},
title = {Ciseau},
year = {2017},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jonathanraiman/ciseau}},
commit = {fe88b9d7f131b88bcdd2ff361df60b6d1cc64c04}
}
```
------
Word and sentence tokenization in Python.
[](https://badge.fury.io/py/ciseau)
[](https://travis-ci.org/JonathanRaiman/ciseau)

[](LICENSE.md)
Usage
-----
Use this package to split up strings according to sentence and word boundaries.
For instance, to simply break up strings into tokens:
```
tokenize("Joey was a great sailor.")
#=> ["Joey ", "was ", "a ", "great ", "sailor ", "."]
```
To also detect sentence boundaries:
```
sent_tokenize("Cat sat mat. Cat's named Cool.", keep_whitespace=True)
#=> [["Cat ", "sat ", "mat", ". "], ["Cat ", "'s ", "named ", "Cool", "."]]
```
`sent_tokenize` can keep the whitespace as-is with the flags `keep_whitespace=True` and `normalize_ascii=False`.
Installation
------------
```
pip3 install ciseau
```
Testing
-------
Run `nose2`.
If you find this project useful for your work or research, here's how you can cite it:
```latex
@misc{RaimanCiseau2017,
author = {Raiman, Jonathan},
title = {Ciseau},
year = {2017},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jonathanraiman/ciseau}},
commit = {fe88b9d7f131b88bcdd2ff361df60b6d1cc64c04}
}
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ciseau-1.0.1.tar.gz
(10.3 kB
view hashes)