A text summarizer
Project description
Summarizer [![Build Status](https://travis-ci.org/michigan-com/summarizer.svg)](https://travis-ci.org/michigan-com/summarizer)
==========
Summarizer is an automatic summarization algorithm.
Requirements
------------
* Python 2.7, 3.3, 3.4, or 3.5
* NLTK
Install it
----------
```
pip install summarizer
```
Use it
------
```
from summarizer import summarize
summarize(title, text)
```
Documentation
-------------
Summarizer.summarize(title, text, count=5, summarizer=Summarizer())
* title: The title of the article
* text: The actual text of the article
* count: The number of summarized sentences to return
* summarizer: The class instance that will do all the work
Contributing
------------
All contributions must be accompanied by some form of unit testing
CHANGES
=======
v0.0.7
------
* Added python 3.5 to travis
* Moved `SentenceTokenizer` to its own file
* Added `dept` to `abbrev_types`
v0.0.6
------
* [FIX] Sgt, Gov, and No abbreviations accounted for
v0.0.5 10-01-2015
-----------------
* [FIX] Python2.7 unicode errors
* Updated PunktLanguageVars to detect special quotation marks
v0.0.4 10-01-2015
-----------------
* [FIX] PYPI didn't pick up the new training data in JSON format
v0.0.3 10-01-2015
-----------------
* [FIX] Tokenizer would think a custom abbreviation, F.B.I., was a sentence break
* [FIX] Added sanitizer module to preprocess text before summarizing it
* [TESTS] Added ability to pull valid tokenized articles from brevity.detroitnow.io
and test that the new tokenizer is still valid
v0.0.2 08-26-2015
-----------------
* PYPI not picking up data files
v0.0.1 08-26-2015
-----------------
* Added setup.py
* Added \_\_version\_\_
* Added unit tests
==========
Summarizer is an automatic summarization algorithm.
Requirements
------------
* Python 2.7, 3.3, 3.4, or 3.5
* NLTK
Install it
----------
```
pip install summarizer
```
Use it
------
```
from summarizer import summarize
summarize(title, text)
```
Documentation
-------------
Summarizer.summarize(title, text, count=5, summarizer=Summarizer())
* title: The title of the article
* text: The actual text of the article
* count: The number of summarized sentences to return
* summarizer: The class instance that will do all the work
Contributing
------------
All contributions must be accompanied by some form of unit testing
CHANGES
=======
v0.0.7
------
* Added python 3.5 to travis
* Moved `SentenceTokenizer` to its own file
* Added `dept` to `abbrev_types`
v0.0.6
------
* [FIX] Sgt, Gov, and No abbreviations accounted for
v0.0.5 10-01-2015
-----------------
* [FIX] Python2.7 unicode errors
* Updated PunktLanguageVars to detect special quotation marks
v0.0.4 10-01-2015
-----------------
* [FIX] PYPI didn't pick up the new training data in JSON format
v0.0.3 10-01-2015
-----------------
* [FIX] Tokenizer would think a custom abbreviation, F.B.I., was a sentence break
* [FIX] Added sanitizer module to preprocess text before summarizing it
* [TESTS] Added ability to pull valid tokenized articles from brevity.detroitnow.io
and test that the new tokenizer is still valid
v0.0.2 08-26-2015
-----------------
* PYPI not picking up data files
v0.0.1 08-26-2015
-----------------
* Added setup.py
* Added \_\_version\_\_
* Added unit tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
summarizer-0.0.7.tar.gz
(280.1 kB
view details)
File details
Details for the file summarizer-0.0.7.tar.gz
.
File metadata
- Download URL: summarizer-0.0.7.tar.gz
- Upload date:
- Size: 280.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e01b334cc238910af83f1d776a19e80f228f90cd47dff6361aad9a3cd1a2cf9f |
|
MD5 | 3a88da9b374e0b773ce246b07108ec56 |
|
BLAKE2b-256 | deca5f5934b4421e928cac8f916e358310486d6e940b52dacc3857b60ba88634 |