A multi-document summarizer based on ILP and sentence fusion.

These details have not been verified by PyPI

Project links

Homepage

Project description

What is this?

Potara is a multi-document summarization system that relies on Integer Linear Programming (ILP) and sentence fusion.

Its goal is to summarize a set of related documents in a few sentences. It proceeds by fusing similar sentences in order to create sentences that are either shorter or more informative than those found in the documents. It then uses ILP in order to choose the best set of sentences, fused or not, that will compose the resulting summary.

It relies on state-of-the-art (as of 2014) approaches introduced by Gillick and Favre for the ILP strategy, and Filippova for the sentence fusion.

It is compatible and tested with Python 3.5 and 3.6.

Install

The easy way

You should be able to install potara and its dependencies with pip

pip install potara

You can also clone this repo and use the requirements.txt file to install dependencies

further requirements

You will also need GLPK, which is used to obtain an optimal summary (example for Debian-based distro)

$ sudo apt-get install glpk

For Ubuntu-based distros you can use:

$ sudo apt-get install libglpk40

You can check that the install run successfully by cloning the repo and running

$ python setup.py test

If you have issues with install, you can check the .travis.yml file of the repo, which corresponds to a working build.

How To

Basically, you can use the following

from potara.summarizer import Summarizer
from potara.document import Document

s = Summarizer()

# Adding docs, preprocessing them and computing some infos for the summarizer
s.setDocuments([Document('data/' + str(i) + '.txt')
                for i in range(1,10)])

# Summarizing, where the actual work is done
s.summarize()

# You can then print the summary
print(s.summary)

There's some preprocessing involved and a sentence fusion step, but I made it easily tunable. Preprocessing may take a while (a few minutes) since there is a lot going on under the hood. Default parameters are currently set for summarizing ~10 documents. You can summarize a smaller amount of documents by tweaking the "minbigramcount" parameter of the summarizer :

s = Summarizer(minbigramcount=2)

Summarizing less than 4 documents would probably yield a bad summary.

Similarity models

Potara relies on similarity scores between sentences. These scores can be shallow using a cosine similarity, or "deep" using gensim Word2Vec semantic representation of words. For the second use case, you'll want to train your own model or use pretrained models. You may contact me if you want to use potara that way, and I may create a tutorial on the matter for the occasion.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.2

Mar 27, 2020

1.0.1

Dec 28, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

potara-1.0.2.tar.gz (24.1 kB view details)

Uploaded Mar 27, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

potara-1.0.2-py3-none-any.whl (24.9 kB view details)

Uploaded Mar 27, 2020 Python 3

File details

Details for the file potara-1.0.2.tar.gz.

File metadata

Download URL: potara-1.0.2.tar.gz
Upload date: Mar 27, 2020
Size: 24.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.5.2

File hashes

Hashes for potara-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`6112d89172521b4703373a585a245fccd13d66cf1624aeb9ae1e576905d35799`
MD5	`2e983bcf27ddf0f0a5a18bb528b0162e`
BLAKE2b-256	`88b5d7facf0c6be83793eb34534d9f73fbf6427a468dc97aa89d8c568d0cff15`

See more details on using hashes here.

File details

Details for the file potara-1.0.2-py3-none-any.whl.

File metadata

Download URL: potara-1.0.2-py3-none-any.whl
Upload date: Mar 27, 2020
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.5.2

File hashes

Hashes for potara-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92b3f89833cffbc3ca84028660468b48e50bbb78f7bd62d8811729df8bb25ce2`
MD5	`43fdca8476a89ab6af3100cc995d7d33`
BLAKE2b-256	`75edf1817ad122139fbafd1c938cad742ffe5c0ffa2a7992d3be1d705ac36851`

See more details on using hashes here.

potara 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is this?

Install

The easy way

further requirements

How To

Similarity models

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes