Skip to main content

textcrafts: Summary, keyphrase and relation extraction with dependecy graphs

Project description

TextGraphCrafts

Python-based summary, keyphrase and relation extractor from text documents using dependency graphs.

HOME: https://github.com/ptarau/TextGraphCrafts

Project Description

** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. Developed with Python 3, on OS X, but portable to Linux.**

Dependencies:

  • python 3.7 or newer, pip3, java 9.x or newer. Also, having git installed is recommended for easy updates
  • pip3 install nltk
  • also, run in python3 something like
import nltk
nltk.download('wordnet')
nltk.download('words')
nltk.download('stopwords')
  • or, if that fails on a Mac, use run python3 down.py to collect the desired nltk resource files.
  • pip3 install networkx
  • pip3 install requests
  • pip3 install graphviz, also ensure .gv files can be viewed
  • pip3 install stanfordnlp parser

Tested with the above on a Mac, with macOS Mojave and Catalina and on Ubuntu Linux 18.x.

Running it:

in a shell window, run

start_server.sh

in another shell window, start with

python3 -i deepRank.py

or by typing

python3 -i go.py

to launch a script doing the same.

interactively, at the ">>>" prompt, try

>>> test1()
>>> test2()
>>> ...
>>> test9()
>>> test12()
>>> test0()

see how to activate other outputs in file

deepRank.py

text file inputs (including the US Constitution const.txt) are in the folder

examples/

Handling PDF documents

The easiest way to do this is to install pdftotext, which is part of Poppler tools.

If pdftotext is installed, you can place a file like textrank.pdf already in subdirectory pdfs/ and try something similar to:

Change setting in file params.py to use the system with other global parameter settings.

Alternative NLP toolkit

Optionally, you can activate the alternative Stanford CoreNLP toolkit as follows:

  • install Stanford CoreNLP and unzip in a derictory of your choice (ag., the local directory)
  • edit if needed start_parser.sh with the location of the parser directory
  • edit params.py and set corenlp=True

Note however that the Stanford CoreNLP is GPL-licensed, which can place restrictions on proprietary software activating this option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textcrafts-0.0.9.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

textcrafts-0.0.9-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file textcrafts-0.0.9.tar.gz.

File metadata

  • Download URL: textcrafts-0.0.9.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/42.0.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for textcrafts-0.0.9.tar.gz
Algorithm Hash digest
SHA256 3fb2a1d6e5d07970f5dce79c1e58c92a3fe33a3c6659bfa052dfc09a7fa48407
MD5 afd370322bfaf8437ae4a157fd38cbfb
BLAKE2b-256 0be61681144f6c60556a6934f570ff673d8c9e87e8bc87c6d7608fc1e64a77f3

See more details on using hashes here.

File details

Details for the file textcrafts-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: textcrafts-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/42.0.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for textcrafts-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c16fb513bdb55d4d8fb1304b31e30421b0ed0882d7a58303a5878fcf0c2a7df3
MD5 4089a5a3fedc2ff531f5dbfcc15ce0fa
BLAKE2b-256 2fa68efb4524b645202bd7b0760e139921209864f2c6523505d2dfbe7f148073

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page