Simple package for generating ngrams and bag of words representation from text.
Project description
A simple package designed to be used for demonstrating basic Natural Language Processing (NLP) feature engineering in Python.
## More Info:
### Practice Dataset
[Stack Exchange Data Dump](https://archive.org/details/stackexchange)
### Text Encoding
[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky](http://www.joelonsoftware.com/articles/Unicode.html)
#### Packages
[`chardet`](https://pypi.python.org/pypi/chardet) - Universal encoding detector for Python 2 and 3
[`cchardet`](https://pypi.python.org/pypi/cchardet/1.0.0) - Universal encoding detector. This library is faster than chardet
[`ftfy`](http://ftfy.readthedocs.org/en/latest/#) - fixes text for you
[`unidecode`](https://pypi.python.org/pypi/Unidecode) - ASCII transliterations of Unicode text
### Natural Language Processing
[Care and Feeding of Topic Models: Problems, Diagnostics, and Improvementes](http://www.people.fas.harvard.edu/~airoldi/pub/books/b02.AiroldiBleiEroshevaFienberg2014HandbookMMM/Ch12_MMM2014.pdf)
### Functional Programing in Python
[Functional programming in Python Examine the functional aspects of Python: which options work well and which ones you should avoid By David Mertz](https://www.oreilly.com/ideas/functional-programming-in-python)
#### Packages
[`toolz`](http://toolz.readthedocs.org/en/latest/) - Toolz provides a set of utility functions for iterators, functions, and dictionaries.
[`functools`](https://docs.python.org/2/library/functools.html#module-functools) - Higher-order functions and operations on callable objects.
[`itertools`](https://docs.python.org/2/library/itertools.html#module-itertools) - Functions creating iterators for efficient looping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text2math-0.0.8.dev1.tar.gz.
File metadata
- Download URL: text2math-0.0.8.dev1.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a59b13dbd6c737b8ed3ee54d2da16bd38921248ccb513051cd063e1798d9b792
|
|
| MD5 |
6ce8efabf819a8c318e1cc83a5577971
|
|
| BLAKE2b-256 |
a5eb32df9f5436e48087082583a3160ef02d40ba4e986b03a1913fbf8d7eb85c
|
File details
Details for the file text2math-0.0.8.dev1-py2.py3-none-any.whl.
File metadata
- Download URL: text2math-0.0.8.dev1-py2.py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae4872dfb088ed5db1d7df9ee91d91d590da8c7092aba9ed396adca1f09f4bd1
|
|
| MD5 |
cdf7cfe89384ca5969225ad1bfff78fd
|
|
| BLAKE2b-256 |
846137783506f7e983e8e47d28efb53ac8e0d47bd1a485727c56db98ca682fc2
|