Simple package for generating ngrams and bag of words representation from text.
Project description
A simple package designed to be used for demonstrating basic Natural Language Processing (NLP) feature engineering in Python.
## More Info:
### Practice Dataset
[Stack Exchange Data Dump](https://archive.org/details/stackexchange)
### Text Encoding
[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky](http://www.joelonsoftware.com/articles/Unicode.html)
#### Packages
[`chardet`](https://pypi.python.org/pypi/chardet) - Universal encoding detector for Python 2 and 3
[`cchardet`](https://pypi.python.org/pypi/cchardet/1.0.0) - Universal encoding detector. This library is faster than chardet
[`ftfy`](http://ftfy.readthedocs.org/en/latest/#) - fixes text for you
[`unidecode`](https://pypi.python.org/pypi/Unidecode) - ASCII transliterations of Unicode text
### Natural Language Processing
[Care and Feeding of Topic Models: Problems, Diagnostics, and Improvementes](http://www.people.fas.harvard.edu/~airoldi/pub/books/b02.AiroldiBleiEroshevaFienberg2014HandbookMMM/Ch12_MMM2014.pdf)
### Functional Programing in Python
[Functional programming in Python Examine the functional aspects of Python: which options work well and which ones you should avoid By David Mertz](https://www.oreilly.com/ideas/functional-programming-in-python)
#### Packages
[`toolz`](http://toolz.readthedocs.org/en/latest/) - Toolz provides a set of utility functions for iterators, functions, and dictionaries.
[`functools`](https://docs.python.org/2/library/functools.html#module-functools) - Higher-order functions and operations on callable objects.
[`itertools`](https://docs.python.org/2/library/itertools.html#module-itertools) - Functions creating iterators for efficient looping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for text2math-0.0.5.dev1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 600468ffccbcc56c7c3e1ae8a505f2f02c8f96a1838b4773b2aa5c6d15a322b2 |
|
MD5 | b84443d87e961cd05afee4b1a46a8c84 |
|
BLAKE2b-256 | df093debd6875f3bdf2de115ca3eaaa1781b4009695557ded110a6dc4aa0439a |