Skip to main content

textcrafts: Summary, keyphrase and relation extraction with dependecy graphs

Project description


Python-based summary, keyphrase and relation extractor from text documents using dependency graphs.


Project Description

** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. Developed with Python 3, on OS X, but portable to Linux.**


  • python 3.7 or newer, pip3, java 9.x or newer. Also, having git installed is recommended for easy updates
  • pip3 install nltk
  • also, run in python3 something like
import nltk'wordnet')'words')'stopwords')
  • or, if that fails on a Mac, use run python3 to collect the desired nltk resource files.
  • pip3 install networkx
  • pip3 install requests
  • pip3 install graphviz, also ensure .gv files can be viewed
  • pip3 install stanfordnlp parser

Tested with the above on a Mac, with macOS Mojave and Catalina and on Ubuntu Linux 18.x.

Running it:

in a shell window, run

in another shell window, start with

python3 -i

or by typing

python3 -i

to launch a script doing the same.

interactively, at the ">>>" prompt, try

>>> test1()
>>> test2()
>>> ...
>>> test9()
>>> test12()
>>> test0()

see how to activate other outputs in file

text file inputs (including the US Constitution const.txt) are in the folder


Handling PDF documents

The easiest way to do this is to install pdftotext, which is part of Poppler tools.

If pdftotext is installed, you can place a file like textrank.pdf already in subdirectory pdfs/ and try something similar to:

Change setting in file to use the system with other global parameter settings.

Alternative NLP toolkit

Optionally, you can activate the alternative Stanford CoreNLP toolkit as follows:

  • install Stanford CoreNLP and unzip in a derictory of your choice (ag., the local directory)
  • edit if needed with the location of the parser directory
  • edit and set corenlp=True

Note however that the Stanford CoreNLP is GPL-licensed, which can place restrictions on proprietary software activating this option.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for textcrafts, version 0.0.6
Filename, size File type Python version Upload date Hashes
Filename, size textcrafts-0.0.6-py3-none-any.whl (17.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size textcrafts-0.0.6.tar.gz (13.4 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page